library(tidyverse)library(kableExtra)library(rstatix)library(ggpubr)library(knitr)library(broom)library(DT)library(janitor)library(emmeans)library(tippy)library(ggfortify)library(lmerTest)orange ="#d85f33"lightorange ="#fcbaa2"tippy::tippy_this(elementId ="power", tooltip ="The ability for a statistical test to detect a significant result, given the significant result actually exists.")
Code: Setup
tippy::tippy_this(elementId ="indep", tooltip ="The values of one datapoint (ROI) does not depend on any other datapoint (any other ROI).")
Code: Setup
tippy::tippy_this(elementId ="homoskedasticity", tooltip ="The variance of an error term or residual is independent.")
1 Executive Summary
The client presents a study investigating cellular phenotypes present in two cardiovascular disease Aetiologies, Dilated Cardiomyopathy (DCM) and Ischemic Cardiomyopathy (ICM). Based on a sample of cellular marking results from 16 patients (8 within each Aetiology) and a conversation with the client, three key goals were identified. Using the proportion of phenotype cells in a given ROI, this statistical investigation found the following results for each goal:
Compare cellular phenotypes present with different Fibrotic Zones within ICM patients and within DCM patients.
Within each Aetiology, the following cellular phenotypes demonstrated a significant difference in phenotype proportion between Fibrotic zones:
Compare cellular phenotypes present within different Fibrotic Zones between ICM and DCM patients.
For the IF Fibrotic Zone, mean phenotype proportion was significantly different between ICM and DCM samples within M2, LymphEndo, Fib3 and C3b phenotypes. In the RF Fibrotic Zone, average phenotype proportion was significantly different for Fib3. LymphEndo and C3b phenotypes.
Compare the relationship between phenotype abundance and Fibrosis score within ICM and DCM patients.
The relationship between phenotype proportion and Fibrosis Score was significant for the following phenotypes within each aetiology:
Cardiovascular disease (CVD) is a leading cause of death and disability worldwide. To diagnose and combat CVD, the client has designed a 38-parameter panel of antibodies for single-cell expression profiling and spatial mapping of the myocardial microenvironment. This process was aided by machine learning tools which have been optimised to the topology of the human heart. The data the client has collected has been sourced from 16 total patients, and was collected from multiple physiological zones within the myocardium. These patients all have experienced heart failure, derived from either Dilated Cardiomyopathy (DCM) or Ischemic Cardiomyopathy (ICM).
ICM involves left ventricle dilation caused by vessel disease, while DCM is left ventricle dilation caused by a non-vessel disease, and is influenced by genetics and excessive alcohol consumption. Collagen within the heart is known as scar tissue, while myocardium represents healthy muscle tissue. This distinction is vital in determining the Fibrotic Zone of a tissue sample. Fibrotic Zone will be addressed further in depth in the following sections. Four different Fibrotic Zones have been identified; Remote, IF, RF and RFC. These correspond to an increasing abundance of fibrotic tissue with a sample.
3 Goals of Analysis
Through analysis of the client’s background information, the client consultation, and the client’s data, the consulting team identified 3 key goals:
3.1 Key Goal 1
Compare the proportion of cellular phenotypes present with different Fibrotic Zones within ICM patients and within DCM patients.
3.2 Key Goal 2
Compare the proportion of cellular phenotypes present within different Fibrotic Zones between ICM and DCM patients.
3.3 Key Goal 3
Compare the relationship between phenotype abundance and fibrosis score within DCM and ICM patients.
4 Datasets and Considerations
4.1 Datasets
The datasets provided by the client represent heart samples from 16 patients, 8 with ICM and 8 with DCM. The data was compiled by our client using machine learning techniques catered to her research goals. There are two datasets utilized within this report: a set referencing sample and area information and a set compiling data on single cells.
The dataset compiling sample and area information is composed of 8 columns:
ROI Data Column Summary
ROI: Region of interest
Sample: Specifies the patient the sample is pulled from
Group: Denotes the zone of the sample, as well as the sample’s disease
Batch: Denotes the batch associated with the sample
Aetiology: Whether the sample is associated with a patient with ICM or DCM
Scar: Area of scarring
Myocardium: Area of myocardium
Background: Background area due to imaging gaps, which is ignored in the calculation of the area. Throughout this report, area refers to the sum of the scar area and the myocardium area, with the fibrosis score being the percentage of scar area compared to the total (non-background) area.
The dataset compiling data on single cells has 57 columns:
Single Cell Data Column Summary
ROI: Region of Interest
ID: The ID number assigned to the cell
X: The x-coordinates of the pixel, with a pixel measured in a one micromillimeter by one micromillimeter micron
Y: The y-coordinates of of the pixel, with a pixel measured in a one micromillimeter by one micromillimeter micron
Area: Area of the pixel
Cell Type: A numerical organisation of cells, which is not relevant to the analysis
Regions: A numerical organisation of regions, which is not relevant to the analysis
Annotated Cell Type: Refers to the type of cell the sample is pulled from, all of which are cells
Annotated Region: Whether the single cell is a myocardium cell or a scar cell
Markers: Percentage coverage for each type of biological marker within a pixel
Annotated Metacluster: Labeling the various cellular phenotypes
Sample: Specifies the patient the sample is pulled from
Group: Denotes the zone of the sample, as well as the sample’s disease
Batch: Denotes the batch associated with the sample
Aetiology: Whether the sample is associated with a patient with ICM or DCM
4.2 Considerations
One significant consideration that we had to navigate during this report was the relatively small sample size. There are 16 patients in this dataset, with 92 Regions of Interest in total. Distributing this out per patient, each patient would have approximately 6 ROIs per patient.
Initially, we determined this would be too small a sample size to make a mixed model, which would have required splitting the given data into 16 groups, with one group for each patient. We believed that if we had created a mixed model, the power of the tests would have been too low, resulting in a statistical test that could not effectively detect significant results. However, after consulting with our academic advisor, he assured us that the use of a mixed linear model was appropriate in addressing Key Goals 1 and 3 (Section 3.1 and Section 3.3).
Another consideration was the issue of independence, which is an assumption of many statistical tests. It can be argued that some ROIs are not independent of each other if they come from the same patient. One way to overcome this independence is to use a mixed model, which we applied for Key Goals 1 and 3.
Normality is another assumption of a \(t\)-test. After a visual assessment of the data, we found that there was not enough evidence to conclude that the assumption of normality is violated.
We were concerned about potential issues with homoskedasticity. However, after consulting with our academic lead, we determined that this issue does not affect the results.
We did notice some high leverage points in Neutrophil1 under (Section 5.3). Removing these does alter the results. Therefore these points should be put under careful consideration when interpreting these results.
5 Analysis
Code: Initial Data Cleaning and Transformation
patient_df =read_csv('data/ROI.GROUP.AREA.data.csv') |>clean_names()# Fibrosis Score Calculationpatient_df = patient_df |>mutate(fibrotic_score = scar/(scar + myocardium)) |>mutate(fibrotic_zone =str_extract(group, "^[^_]*")) |>mutate(fibrotic_zone =factor(fibrotic_zone, levels=c('Remote', "IF", 'RF', "RFC"))) |>mutate(patient_id =str_extract(sample, "[^_]+$")) |>rename(roi=roi_1)# Mapping Table between Patients, ROIs and Fibrotic Zonepatient_id_roi_fibrotic_zone_mapping_df = patient_df |>select(patient_id, roi, fibrotic_zone) |>unique()cell_df =read_csv('data/HFCellPop.csv') |>clean_names()# Cell counts in each ROIroi_total_cells_count_df = cell_df |>select(roi) |>group_by(roi) |>summarise(roi_cell_count =n()) |>ungroup()# Proportion of each Phenotype in each ROIcell_proportion_df = cell_df |>select(roi, annotated_metacluster, sample, group, aetiology) |>group_by(roi, annotated_metacluster, sample, group, aetiology) |>summarise(phenotype_cell_count =n()) |>ungroup() |>mutate(fibrotic_zone =str_extract(group, "^[^_]*")) |>mutate(fibrotic_zone =factor(fibrotic_zone, levels=c('Remote', "IF", 'RF', "RFC"))) |>mutate(patient_id =str_extract(sample, "[^_]+$")) |>left_join(roi_total_cells_count_df) |>mutate(phenotype_cell_proportion = phenotype_cell_count/roi_cell_count)# Simplified dataframe of cell score and cell proportion for ROI proportion_score_df = cell_proportion_df |>left_join(patient_df |>select(roi, fibrotic_score, fibrotic_zone))# ICM only Dataframe icm_disease_cell_proportion_df = cell_proportion_df |>filter(aetiology =='ICM')annotated_metaclusters = cell_df |>select(annotated_metacluster) |>unique()rois = icm_disease_cell_proportion_df |>select(roi) |>unique()icm_df =data.frame(roi=c(), phenotype=c(), phenotype_proportion=c())# One row per ROI+phenotype pair, with value of phenotype proportionfor (phenotype_loop inas.list(annotated_metaclusters$annotated_metacluster)) {for (roi_loop inas.list(rois$roi)) { df = icm_disease_cell_proportion_df |>filter(annotated_metacluster == phenotype_loop, roi == roi_loop) |>select(phenotype_cell_proportion) phenotype_proportion_for_roi = df$phenotype_cell_proportion[1] icm_df =rbind(icm_df, data.frame(roi=c(roi_loop), phenotype=c(phenotype_loop), phenotype_proportion=c(phenotype_proportion_for_roi))) }}# Fills in empty rows with 0 (phenotype not present in ROI)icm_df = icm_df |>replace_na(list(phenotype_proportion=0))# Adds in patient and Fibrotic Zone infoicm_df = icm_df |>left_join(patient_id_roi_fibrotic_zone_mapping_df)# DCM only Dataframedcm_disease_cell_proportion_df = cell_proportion_df |>filter(aetiology =='DCM')rois = dcm_disease_cell_proportion_df |>select(roi) |>unique()dcm_df =data.frame(roi=c(), phenotype=c(), phenotype_proportion=c())# One row per ROI+phenotype pair, with value of phenotype proportionfor (phenotype_loop inas.list(annotated_metaclusters$annotated_metacluster)) {for (roi_loop inas.list(rois$roi)) { df = dcm_disease_cell_proportion_df |>filter(annotated_metacluster == phenotype_loop, roi == roi_loop) |>select(phenotype_cell_proportion) dcm_df =rbind(dcm_df, data.frame(roi=c(roi_loop), phenotype=c(phenotype_loop), phenotype_proportion=c(df$phenotype_cell_proportion[1]))) }}# Fills in empty rows with 0 (phenotype not present in ROI)dcm_df = dcm_df |>replace_na(list(phenotype_proportion=0))# Adds in patient and Fibrotic Zone infodcm_df = dcm_df |>left_join(patient_id_roi_fibrotic_zone_mapping_df)
To meet the key goals of this report, we investigated the proportion of cells that a particular phenotype took up within an ROI. If a phenotype was not present in an ROI, it was taken that this phenotype took up \(0\%\) of the ROI. Phenotype proportion within an ROI was defined as follows: \[\text{Phenotype Proportion} = \frac{\text{\# Cells of Given Phenotype}}{\text{\# Cells in the ROI}}\]
For Key Goal 3, We also investigated Fibrosis Scores for ROIs. This is defined as the scarred area divided by the total cell area (not including background from the cellular marker), as below: \[\text{Fibrosis Score} = \frac{\text{Scarred area in cell}}{\text{Total cell area}}\]
5.1 Key Goal 1 Analysis
Key Goal 1: Compare the proportion of cellular phenotypes present with different Fibrotic Zones within ICM patients and within DCM patients.
Statistical Approach: For a given phenotype and a given Aetiology, create two Linear Mixed Models (LMM) to predict Phenotype Proportion. Each model includes the random effect of each patient, to account for the fact that each observation is not independent. One of these models, the ‘full model’ includes the effect of Fibrotic Zone: \[\text{Phenotype Proportion} \sim 1 + \text{Fibrotic Zone} + (1|\text{Patient ID})\]
The other model, the ‘null model’, does not include the effect of Fibrotic Zone: \[\text{Phenotype Proportion} \sim 1 + (1|\text{Patient ID})\]
These two LMMs are compared using ANOVA to determine whether the effect of Fibrotic Zone is significant on the Phenotype Proportion by investigating the difference between the two models. If a result is significant, this means that Phenotype Proportion differs significantly across the Fibrotic Zones within that aetiology. This has been visualised below with a boxplot.
Statistical Assumptions: For an LMM, it is assumed that the data is linearly related and error terms are independent, normally distributed and have constant variance. Visual inspection of the data through exploratory analysis ensured these assumptions were satisfied. In the ANOVA of these two LMMs, the assumptions of independence, normality and homogeneity of variance are met.
Code: Linear Mixed Model ANOVA of Phenotype Abundance throughout Fibrotic Zones per Phenotype
kg1_res_ls =list() #list for all resultskg1_sig_res_ls =list() #list for significant results # For each phenotype within each aetiology (ICM or DCM)for (aetiolog inunique(cell_proportion_df$aetiology)){ aetiology_df = proportion_score_df[cell_proportion_df$aetiology == aetiolog, ]for (phenotype inunique(aetiology_df$annotated_metacluster)){ phenotype_df = aetiology_df[aetiology_df$annotated_metacluster == phenotype, ]if (nrow(phenotype_df) >1){ #if more than one observation for phenotype and aetiology m_full =lmer(phenotype_cell_proportion ~1+ fibrotic_zone + (1|patient_id), data = phenotype_df) m_null =lmer(phenotype_cell_proportion ~1+ (1|patient_id), data = phenotype_df) a =anova(m_null, m_full) #compare null (not including fibrotic zone) and full (including fibrotic zone) models by ANOVA# Store ANOVA results as dataframe res = a |>as.data.frame() # Side-by-side boxplot p = phenotype_df |>ggplot() +aes(x=fibrotic_zone, y=phenotype_cell_proportion) +geom_boxplot() +# Individual Data Pointsgeom_jitter(size=1, colour=orange, alpha=0.5, height=0, width=0.1) +labs(title=paste0(phenotype)) +labs(x="Fibrotic Zone",y="Phenotype Proportion") +scale_y_continuous(labels = scales::percent) +theme(plot.background =element_rect(fill ="#ffffff",linewidth =0),panel.border =element_rect(colour ="black", fill=NA),legend.box.background =element_rect(colour ="black"),axis.title =element_text(face="bold"), plot.title =element_text(face="bold", size =14, hjust =0.5))rownames(res) =c("Null Model", "Full Model")colnames(res) =c("Num. Par.", "AIC", "BIC", "log Lik.", "Deviance", "Chi²", "Df", "Pr(<Chi²)")# Store ANOVA results table and boxplot in list labelled under Phenotype name kg1_res_ls[[paste(aetiolog, phenotype)]] =list(table =kable(res, digits =c(0,1,1,1,1,1,0,4)), plot = p)# Add ANOVA results table and boxplot to list for significant results if ANOVA is significantif (res$`Pr(<Chi²)` [2] <0.05){ kg1_sig_res_ls[[paste(aetiolog, phenotype)]] =list(table =kable(res, digits =c(0,1,1,1,1,2,0,4)), plot = p) } } }}kg1_res_ls_icm = kg1_res_ls[grep("ICM", names(kg1_res_ls))] # ICM results list kg1_res_ls_dcm = kg1_res_ls[grep("DCM", names(kg1_res_ls))] # DCM results list kg1_sig_res_ls_icm = kg1_sig_res_ls[grep("ICM", names(kg1_sig_res_ls))] # ICM significant results list kg1_sig_res_ls_dcm = kg1_sig_res_ls[grep("DCM", names(kg1_sig_res_ls))] # DCM significant results list
5.1.1 Within ICM
Within the ICM Aetiology, the effect of Fibrotic Zone on Phenotype Proportion was significant in the C1, C2b, C3a, CD45RO, Fib2, Fib3, Fibrocyte2, M2, PopSMA+Fx13a+, ResMac, SMA1, SMA3, C2a, Activated Th and SMA2 cellular phenotypes.
These significant results are outputted below. Results for all phenotypes can be found under the ‘All Results’ tab below.
Figure: Side-by-side boxplot of Phenotype Proportion of C1 against Fibrotic Zone for ICM ROIs.
ANOVA Summary Table for C1
Num. Par.
AIC
BIC
log Lik.
Deviance
Chi²
Df
Pr(<Chi²)
Null Model
3
-90.8
-85.2
48.4
-96.8
NA
NA
NA
Full Model
6
-109.3
-98.0
60.6
-121.3
24.51
3
0
C2b
Figure: Side-by-side boxplot of Phenotype Proportion of C2b against Fibrotic Zone for ICM ROIs.
ANOVA Summary Table for C2b
Num. Par.
AIC
BIC
log Lik.
Deviance
Chi²
Df
Pr(<Chi²)
Null Model
3
-255.2
-250.5
130.6
-261.2
NA
NA
NA
Full Model
6
-267.5
-258.2
139.7
-279.5
18.32
3
4e-04
C3a
Figure: Side-by-side boxplot of Phenotype Proportion of C3a against Fibrotic Zone for ICM ROIs.
ANOVA Summary Table for C3a
Num. Par.
AIC
BIC
log Lik.
Deviance
Chi²
Df
Pr(<Chi²)
Null Model
3
-118.5
-113.6
62.3
-124.5
NA
NA
NA
Full Model
6
-125.7
-115.9
68.8
-137.7
13.17
3
0.0043
CD45RO
Figure: Side-by-side boxplot of Phenotype Proportion of CD45RO against Fibrotic Zone for ICM ROIs.
ANOVA Summary Table for CD45RO
Num. Par.
AIC
BIC
log Lik.
Deviance
Chi²
Df
Pr(<Chi²)
Null Model
3
-196.0
-190.7
101.0
-202.0
NA
NA
NA
Full Model
6
-205.9
-195.3
108.9
-217.9
15.87
3
0.0012
Fib2
Figure: Side-by-side boxplot of Phenotype Proportion of Fib2 against Fibrotic Zone for ICM ROIs.
ANOVA Summary Table for Fib2
Num. Par.
AIC
BIC
log Lik.
Deviance
Chi²
Df
Pr(<Chi²)
Null Model
3
-128.2
-123.3
67.1
-134.2
NA
NA
NA
Full Model
6
-131.5
-121.7
71.7
-143.5
9.31
3
0.0255
Fib3
Figure: Side-by-side boxplot of Phenotype Proportion of Fib3 against Fibrotic Zone for ICM ROIs.
ANOVA Summary Table for Fib3
Num. Par.
AIC
BIC
log Lik.
Deviance
Chi²
Df
Pr(<Chi²)
Null Model
3
-246.2
-241.0
126.1
-252.2
NA
NA
NA
Full Model
6
-253.7
-243.3
132.9
-265.7
13.56
3
0.0036
Fibrocyte2
Figure: Side-by-side boxplot of Phenotype Proportion of Fibrocyte2 against Fibrotic Zone for ICM ROIs.
ANOVA Summary Table for Fibrocyte2
Num. Par.
AIC
BIC
log Lik.
Deviance
Chi²
Df
Pr(<Chi²)
Null Model
3
-238.9
-233.4
122.5
-244.9
NA
NA
NA
Full Model
6
-243.6
-232.5
127.8
-255.6
10.69
3
0.0135
M2
Figure: Side-by-side boxplot of Phenotype Proportion of M2 against Fibrotic Zone for ICM ROIs.
ANOVA Summary Table for M2
Num. Par.
AIC
BIC
log Lik.
Deviance
Chi²
Df
Pr(<Chi²)
Null Model
3
-200.6
-195.0
103.3
-206.6
NA
NA
NA
Full Model
6
-206.0
-194.9
109.0
-218.0
11.4
3
0.0097
PopSMA+Fx13a+
Figure: Side-by-side boxplot of Phenotype Proportion of PopSMA+Fx13a+ against Fibrotic Zone for ICM ROIs.
ANOVA Summary Table for PopSMA+Fx13a+
Num. Par.
AIC
BIC
log Lik.
Deviance
Chi²
Df
Pr(<Chi²)
Null Model
3
-213.1
-208.6
109.5
-219.1
NA
NA
NA
Full Model
6
-230.3
-221.3
121.2
-242.3
23.22
3
0
ResMac
Figure: Side-by-side boxplot of Phenotype Proportion of ResMac against Fibrotic Zone for ICM ROIs.
ANOVA Summary Table for ResMac
Num. Par.
AIC
BIC
log Lik.
Deviance
Chi²
Df
Pr(<Chi²)
Null Model
3
-179.0
-173.3
92.5
-185.0
NA
NA
NA
Full Model
6
-182.9
-171.6
97.5
-194.9
9.95
3
0.019
SMA1
Figure: Side-by-side boxplot of Phenotype Proportion of SMA1 against Fibrotic Zone for ICM ROIs.
ANOVA Summary Table for SMA1
Num. Par.
AIC
BIC
log Lik.
Deviance
Chi²
Df
Pr(<Chi²)
Null Model
3
-79.9
-74.2
42.9
-85.9
NA
NA
NA
Full Model
6
-106.3
-94.9
59.1
-118.3
32.45
3
0
SMA3
Figure: Side-by-side boxplot of Phenotype Proportion of SMA3 against Fibrotic Zone for ICM ROIs.
ANOVA Summary Table for SMA3
Num. Par.
AIC
BIC
log Lik.
Deviance
Chi²
Df
Pr(<Chi²)
Null Model
3
-91.6
-88.9
48.8
-97.6
NA
NA
NA
Full Model
6
-97.4
-92.0
54.7
-109.4
11.79
3
0.0081
C2a
Figure: Side-by-side boxplot of Phenotype Proportion of C2a against Fibrotic Zone for ICM ROIs.
ANOVA Summary Table for C2a
Num. Par.
AIC
BIC
log Lik.
Deviance
Chi²
Df
Pr(<Chi²)
Null Model
3
-68.5
-63.6
37.2
-74.5
NA
NA
NA
Full Model
6
-85.9
-76.3
49.0
-97.9
23.49
3
0
Activated
Th
Figure: Side-by-side boxplot of Phenotype Proportion of Activated against Fibrotic Zone for ICM ROIs.
Figure: Side-by-side boxplot of Phenotype Proportion of Th against Fibrotic Zone for ICM ROIs.
ANOVA Summary Table for Activated
ANOVA Summary Table for Th
Num. Par.
AIC
BIC
log Lik.
Deviance
Chi²
Df
Pr(<Chi²)
Null Model
3
-11.2
-13.0
8.6
-17.2
NA
NA
NA
Full Model
4
-13.6
-16.1
10.8
-21.6
4.45
1
0.0348
SMA2
Figure: Side-by-side boxplot of Phenotype Proportion of SMA2 against Fibrotic Zone for ICM ROIs.
Figure: Side-by-side boxplot of Phenotype Proportion of C1 against Fibrotic Zone for ICM ROIs.
ANOVA Summary Table for C1
Num. Par.
AIC
BIC
log Lik.
Deviance
Chi²
Df
Pr(<Chi²)
Null Model
3
-90.8
-85.2
48.4
-96.8
NA
NA
NA
Full Model
6
-109.3
-98.0
60.6
-121.3
24.5
3
0
C2b
Figure: Side-by-side boxplot of Phenotype Proportion of C2b against Fibrotic Zone for ICM ROIs.
ANOVA Summary Table for C2b
Num. Par.
AIC
BIC
log Lik.
Deviance
Chi²
Df
Pr(<Chi²)
Null Model
3
-255.2
-250.5
130.6
-261.2
NA
NA
NA
Full Model
6
-267.5
-258.2
139.7
-279.5
18.3
3
4e-04
C3a
Figure: Side-by-side boxplot of Phenotype Proportion of C3a against Fibrotic Zone for ICM ROIs.
ANOVA Summary Table for C3a
Num. Par.
AIC
BIC
log Lik.
Deviance
Chi²
Df
Pr(<Chi²)
Null Model
3
-118.5
-113.6
62.3
-124.5
NA
NA
NA
Full Model
6
-125.7
-115.9
68.8
-137.7
13.2
3
0.0043
C3b
Figure: Side-by-side boxplot of Phenotype Proportion of C3b against Fibrotic Zone for ICM ROIs.
ANOVA Summary Table for C3b
Num. Par.
AIC
BIC
log Lik.
Deviance
Chi²
Df
Pr(<Chi²)
Null Model
3
-110.7
-106.3
58.4
-116.7
NA
NA
NA
Full Model
6
-107.9
-99.1
60.0
-119.9
3.2
3
0.3614
C4
Figure: Side-by-side boxplot of Phenotype Proportion of C4 against Fibrotic Zone for ICM ROIs.
ANOVA Summary Table for C4
Num. Par.
AIC
BIC
log Lik.
Deviance
Chi²
Df
Pr(<Chi²)
Null Model
3
-141.0
-136.7
73.5
-147.0
NA
NA
NA
Full Model
6
-136.2
-127.6
74.1
-148.2
1.1
3
0.774
CD45RO
Figure: Side-by-side boxplot of Phenotype Proportion of CD45RO against Fibrotic Zone for ICM ROIs.
ANOVA Summary Table for CD45RO
Num. Par.
AIC
BIC
log Lik.
Deviance
Chi²
Df
Pr(<Chi²)
Null Model
3
-196.0
-190.7
101.0
-202.0
NA
NA
NA
Full Model
6
-205.9
-195.3
108.9
-217.9
15.9
3
0.0012
CytoT
Figure: Side-by-side boxplot of Phenotype Proportion of CytoT against Fibrotic Zone for ICM ROIs.
ANOVA Summary Table for CytoT
Num. Par.
AIC
BIC
log Lik.
Deviance
Chi²
Df
Pr(<Chi²)
Null Model
3
-213.7
-209.8
109.9
-219.7
NA
NA
NA
Full Model
6
-211.4
-203.6
111.7
-223.4
3.7
3
0.2962
Endo
Figure: Side-by-side boxplot of Phenotype Proportion of Endo against Fibrotic Zone for ICM ROIs.
ANOVA Summary Table for Endo
Num. Par.
AIC
BIC
log Lik.
Deviance
Chi²
Df
Pr(<Chi²)
Null Model
3
-208.3
-202.6
107.1
-214.3
NA
NA
NA
Full Model
6
-207.8
-196.4
109.9
-219.8
5.5
3
0.1391
Fib1
Figure: Side-by-side boxplot of Phenotype Proportion of Fib1 against Fibrotic Zone for ICM ROIs.
ANOVA Summary Table for Fib1
Num. Par.
AIC
BIC
log Lik.
Deviance
Chi²
Df
Pr(<Chi²)
Null Model
3
-88.5
-82.9
47.3
-94.5
NA
NA
NA
Full Model
6
-89.9
-78.6
51.0
-101.9
7.4
3
0.0609
Fib2
Figure: Side-by-side boxplot of Phenotype Proportion of Fib2 against Fibrotic Zone for ICM ROIs.
ANOVA Summary Table for Fib2
Num. Par.
AIC
BIC
log Lik.
Deviance
Chi²
Df
Pr(<Chi²)
Null Model
3
-128.2
-123.3
67.1
-134.2
NA
NA
NA
Full Model
6
-131.5
-121.7
71.7
-143.5
9.3
3
0.0255
Fib3
Figure: Side-by-side boxplot of Phenotype Proportion of Fib3 against Fibrotic Zone for ICM ROIs.
ANOVA Summary Table for Fib3
Num. Par.
AIC
BIC
log Lik.
Deviance
Chi²
Df
Pr(<Chi²)
Null Model
3
-246.2
-241.0
126.1
-252.2
NA
NA
NA
Full Model
6
-253.7
-243.3
132.9
-265.7
13.6
3
0.0036
Fibrocyte
Figure: Side-by-side boxplot of Phenotype Proportion of Fibrocyte against Fibrotic Zone for ICM ROIs.
ANOVA Summary Table for Fibrocyte
Num. Par.
AIC
BIC
log Lik.
Deviance
Chi²
Df
Pr(<Chi²)
Null Model
3
-101.1
-97.9
53.5
-107.1
NA
NA
NA
Full Model
5
-97.3
-92.1
53.7
-107.3
0.3
2
0.879
Fibrocyte2
Figure: Side-by-side boxplot of Phenotype Proportion of Fibrocyte2 against Fibrotic Zone for ICM ROIs.
ANOVA Summary Table for Fibrocyte2
Num. Par.
AIC
BIC
log Lik.
Deviance
Chi²
Df
Pr(<Chi²)
Null Model
3
-238.9
-233.4
122.5
-244.9
NA
NA
NA
Full Model
6
-243.6
-232.5
127.8
-255.6
10.7
3
0.0135
HypoEndo
Figure: Side-by-side boxplot of Phenotype Proportion of HypoEndo against Fibrotic Zone for ICM ROIs.
ANOVA Summary Table for HypoEndo
Num. Par.
AIC
BIC
log Lik.
Deviance
Chi²
Df
Pr(<Chi²)
Null Model
3
-163.2
-158.6
84.6
-169.2
NA
NA
NA
Full Model
6
-161.7
-152.3
86.8
-173.7
4.4
3
0.2182
HypoEndo1
Figure: Side-by-side boxplot of Phenotype Proportion of HypoEndo1 against Fibrotic Zone for ICM ROIs.
ANOVA Summary Table for HypoEndo1
Num. Par.
AIC
BIC
log Lik.
Deviance
Chi²
Df
Pr(<Chi²)
Null Model
3
-150.9
-145.5
78.4
-156.9
NA
NA
NA
Full Model
6
-149.8
-139.1
80.9
-161.8
4.9
3
0.1797
LymphEndo
Figure: Side-by-side boxplot of Phenotype Proportion of LymphEndo against Fibrotic Zone for ICM ROIs.
ANOVA Summary Table for LymphEndo
Num. Par.
AIC
BIC
log Lik.
Deviance
Chi²
Df
Pr(<Chi²)
Null Model
3
-137.4
-134.2
71.7
-143.4
NA
NA
NA
Full Model
5
-135.8
-130.6
72.9
-145.8
2.4
2
0.2959
M2
Figure: Side-by-side boxplot of Phenotype Proportion of M2 against Fibrotic Zone for ICM ROIs.
ANOVA Summary Table for M2
Num. Par.
AIC
BIC
log Lik.
Deviance
Chi²
Df
Pr(<Chi²)
Null Model
3
-200.6
-195.0
103.3
-206.6
NA
NA
NA
Full Model
6
-206.0
-194.9
109.0
-218.0
11.4
3
0.0097
MonMac
Figure: Side-by-side boxplot of Phenotype Proportion of MonMac against Fibrotic Zone for ICM ROIs.
ANOVA Summary Table for MonMac
Num. Par.
AIC
BIC
log Lik.
Deviance
Chi²
Df
Pr(<Chi²)
Null Model
3
-59.2
-57.3
32.6
-65.2
NA
NA
NA
Full Model
5
-55.6
-52.4
32.8
-65.6
0.4
2
0.8296
Neutrophil1
Figure: Side-by-side boxplot of Phenotype Proportion of Neutrophil1 against Fibrotic Zone for ICM ROIs.
ANOVA Summary Table for Neutrophil1
Num. Par.
AIC
BIC
log Lik.
Deviance
Chi²
Df
Pr(<Chi²)
Null Model
3
-184.1
-179.0
95.0
-190.1
NA
NA
NA
Full Model
6
-182.5
-172.4
97.2
-194.5
4.4
3
0.2189
Neutrophil2
Figure: Side-by-side boxplot of Phenotype Proportion of Neutrophil2 against Fibrotic Zone for ICM ROIs.
ANOVA Summary Table for Neutrophil2
Num. Par.
AIC
BIC
log Lik.
Deviance
Chi²
Df
Pr(<Chi²)
Null Model
3
-136.3
-133.3
71.1
-142.3
NA
NA
NA
Full Model
6
-137.0
-131.1
74.5
-149.0
6.8
3
0.0797
PopS100+
Figure: Side-by-side boxplot of Phenotype Proportion of PopS100+ against Fibrotic Zone for ICM ROIs.
ANOVA Summary Table for PopS100+
Num. Par.
AIC
BIC
log Lik.
Deviance
Chi²
Df
Pr(<Chi²)
Null Model
3
-207.5
-202.0
106.7
-213.5
NA
NA
NA
Full Model
6
-206.9
-195.9
109.4
-218.9
5.4
3
0.1465
PopSMA+Fx13a+
Figure: Side-by-side boxplot of Phenotype Proportion of PopSMA+Fx13a+ against Fibrotic Zone for ICM ROIs.
ANOVA Summary Table for PopSMA+Fx13a+
Num. Par.
AIC
BIC
log Lik.
Deviance
Chi²
Df
Pr(<Chi²)
Null Model
3
-213.1
-208.6
109.5
-219.1
NA
NA
NA
Full Model
6
-230.3
-221.3
121.2
-242.3
23.2
3
0
ResMac
Figure: Side-by-side boxplot of Phenotype Proportion of ResMac against Fibrotic Zone for ICM ROIs.
ANOVA Summary Table for ResMac
Num. Par.
AIC
BIC
log Lik.
Deviance
Chi²
Df
Pr(<Chi²)
Null Model
3
-179.0
-173.3
92.5
-185.0
NA
NA
NA
Full Model
6
-182.9
-171.6
97.5
-194.9
10
3
0.019
SMA1
Figure: Side-by-side boxplot of Phenotype Proportion of SMA1 against Fibrotic Zone for ICM ROIs.
ANOVA Summary Table for SMA1
Num. Par.
AIC
BIC
log Lik.
Deviance
Chi²
Df
Pr(<Chi²)
Null Model
3
-79.9
-74.2
42.9
-85.9
NA
NA
NA
Full Model
6
-106.3
-94.9
59.1
-118.3
32.4
3
0
SMA3
Figure: Side-by-side boxplot of Phenotype Proportion of SMA3 against Fibrotic Zone for ICM ROIs.
ANOVA Summary Table for SMA3
Num. Par.
AIC
BIC
log Lik.
Deviance
Chi²
Df
Pr(<Chi²)
Null Model
3
-91.6
-88.9
48.8
-97.6
NA
NA
NA
Full Model
6
-97.4
-92.0
54.7
-109.4
11.8
3
0.0081
TReg
Figure: Side-by-side boxplot of Phenotype Proportion of TReg against Fibrotic Zone for ICM ROIs.
ANOVA Summary Table for TReg
Num. Par.
AIC
BIC
log Lik.
Deviance
Chi²
Df
Pr(<Chi²)
Null Model
3
-86.6
-85.7
46.3
-92.6
NA
NA
NA
Full Model
4
-84.8
-83.6
46.4
-92.8
0.3
1
0.6087
Th
Figure: Side-by-side boxplot of Phenotype Proportion of Th against Fibrotic Zone for ICM ROIs.
ANOVA Summary Table for Th
Num. Par.
AIC
BIC
log Lik.
Deviance
Chi²
Df
Pr(<Chi²)
Null Model
3
-147.1
-144
76.6
-153.1
NA
NA
NA
Full Model
5
-148.2
-143
79.1
-158.2
5
2
0.0803
pDC
Figure: Side-by-side boxplot of Phenotype Proportion of pDC against Fibrotic Zone for ICM ROIs.
ANOVA Summary Table for pDC
Num. Par.
AIC
BIC
log Lik.
Deviance
Chi²
Df
Pr(<Chi²)
Null Model
3
-239.9
-236.1
123.0
-245.9
NA
NA
NA
Full Model
6
-235.6
-228.1
123.8
-247.6
1.7
3
0.6347
C2a
Figure: Side-by-side boxplot of Phenotype Proportion of C2a against Fibrotic Zone for ICM ROIs.
ANOVA Summary Table for C2a
Num. Par.
AIC
BIC
log Lik.
Deviance
Chi²
Df
Pr(<Chi²)
Null Model
3
-68.5
-63.6
37.2
-74.5
NA
NA
NA
Full Model
6
-85.9
-76.3
49.0
-97.9
23.5
3
0
B
Figure: Side-by-side boxplot of Phenotype Proportion of B against Fibrotic Zone for ICM ROIs.
ANOVA Summary Table for B
Num. Par.
AIC
BIC
log Lik.
Deviance
Chi²
Df
Pr(<Chi²)
Null Model
3
-103.3
-100.5
54.7
-109.3
NA
NA
NA
Full Model
5
-100.7
-96.0
55.4
-110.7
1.4
2
0.5
M1
Figure: Side-by-side boxplot of Phenotype Proportion of M1 against Fibrotic Zone for ICM ROIs.
ANOVA Summary Table for M1
Num. Par.
AIC
BIC
log Lik.
Deviance
Chi²
Df
Pr(<Chi²)
Null Model
3
-113.6
-111.7
59.8
-119.6
NA
NA
NA
Full Model
6
-112.1
-108.2
62.0
-124.1
4.4
3
0.217
PopCD45RO+HLADR+
Figure: Side-by-side boxplot of Phenotype Proportion of PopCD45RO+HLADR+ against Fibrotic Zone for ICM ROIs.
ANOVA Summary Table for PopCD45RO+HLADR+
Num. Par.
AIC
BIC
log Lik.
Deviance
Chi²
Df
Pr(<Chi²)
Null Model
3
-83.3
-81.3
44.6
-89.3
NA
NA
NA
Full Model
5
-82.0
-78.8
46.0
-92.0
2.7
2
0.2569
Activated
Th
Figure: Side-by-side boxplot of Phenotype Proportion of Activated against Fibrotic Zone for ICM ROIs.
Figure: Side-by-side boxplot of Phenotype Proportion of Th against Fibrotic Zone for ICM ROIs.
ANOVA Summary Table for Activated
ANOVA Summary Table for Th
Num. Par.
AIC
BIC
log Lik.
Deviance
Chi²
Df
Pr(<Chi²)
Null Model
3
-11.2
-13.0
8.6
-17.2
NA
NA
NA
Full Model
4
-13.6
-16.1
10.8
-21.6
4.5
1
0.0348
Mast
Figure: Side-by-side boxplot of Phenotype Proportion of Mast against Fibrotic Zone for ICM ROIs.
ANOVA Summary Table for Mast
Num. Par.
AIC
BIC
log Lik.
Deviance
Chi²
Df
Pr(<Chi²)
Null Model
3
-81.7
-81.1
43.8
-87.7
NA
NA
NA
Full Model
6
-78.7
-77.5
45.4
-90.7
3.1
3
0.3834
SMA2
Figure: Side-by-side boxplot of Phenotype Proportion of SMA2 against Fibrotic Zone for ICM ROIs.
ANOVA Summary Table for SMA2
Num. Par.
AIC
BIC
log Lik.
Deviance
Chi²
Df
Pr(<Chi²)
Null Model
3
-67.3
-64.9
36.6
-73.3
NA
NA
NA
Full Model
6
-73.0
-68.4
42.5
-85.0
11.8
3
0.0082
5.1.2 Within DCM
Within the DCM Aetiology, the effect of Fibrotic Zone on Phenotype Proportion was significant in the C1, C2a, CD45RO, Fib3, M2, PopS100+, PopSMA+Fx13a+, ResMac, SMA1, C4, LymphEndo, SMA3 and Mast cellular phenotypes.
These significant results are outputted below. Results for all phenotypes can be found under the ‘All Results’ tab below.
Figure: Side-by-side boxplot of Phenotype Proportion of C1 against Fibrotic Zone for ICM ROIs.
ANOVA Summary Table for C1
Num. Par.
AIC
BIC
log Lik.
Deviance
Chi²
Df
Pr(<Chi²)
Null Model
3
-69.4
-64.2
37.7
-75.4
NA
NA
NA
Full Model
6
-86.2
-75.9
49.1
-98.2
22.8
3
0
C2a
Figure: Side-by-side boxplot of Phenotype Proportion of C2a against Fibrotic Zone for ICM ROIs.
ANOVA Summary Table for C2a
Num. Par.
AIC
BIC
log Lik.
Deviance
Chi²
Df
Pr(<Chi²)
Null Model
3
-81.1
-76.1
43.6
-87.1
NA
NA
NA
Full Model
6
-87.6
-77.6
49.8
-99.6
12.5
3
0.006
C3a
Figure: Side-by-side boxplot of Phenotype Proportion of C3a against Fibrotic Zone for ICM ROIs.
ANOVA Summary Table for C3a
Num. Par.
AIC
BIC
log Lik.
Deviance
Chi²
Df
Pr(<Chi²)
Null Model
3
-116.7
-112.5
61.3
-122.7
NA
NA
NA
Full Model
6
-117.4
-109.0
64.7
-129.4
6.7
3
0.0805
C3b
Figure: Side-by-side boxplot of Phenotype Proportion of C3b against Fibrotic Zone for ICM ROIs.
ANOVA Summary Table for C3b
Num. Par.
AIC
BIC
log Lik.
Deviance
Chi²
Df
Pr(<Chi²)
Null Model
3
-156.6
-153.3
81.3
-162.6
NA
NA
NA
Full Model
6
-154.1
-147.5
83.0
-166.1
3.4
3
0.3285
CD45RO
Figure: Side-by-side boxplot of Phenotype Proportion of CD45RO against Fibrotic Zone for ICM ROIs.
ANOVA Summary Table for CD45RO
Num. Par.
AIC
BIC
log Lik.
Deviance
Chi²
Df
Pr(<Chi²)
Null Model
3
-129.5
-124.5
67.8
-135.5
NA
NA
NA
Full Model
6
-137.3
-127.3
74.6
-149.3
13.8
3
0.0033
CytoT
Figure: Side-by-side boxplot of Phenotype Proportion of CytoT against Fibrotic Zone for ICM ROIs.
ANOVA Summary Table for CytoT
Num. Par.
AIC
BIC
log Lik.
Deviance
Chi²
Df
Pr(<Chi²)
Null Model
3
-144.6
-141.8
75.3
-150.6
NA
NA
NA
Full Model
5
-143.5
-138.8
76.8
-153.5
2.9
2
0.2327
Endo
Figure: Side-by-side boxplot of Phenotype Proportion of Endo against Fibrotic Zone for ICM ROIs.
ANOVA Summary Table for Endo
Num. Par.
AIC
BIC
log Lik.
Deviance
Chi²
Df
Pr(<Chi²)
Null Model
3
-186.7
-181.5
96.4
-192.7
NA
NA
NA
Full Model
6
-187.1
-176.5
99.5
-199.1
6.3
3
0.0959
Fib1
Figure: Side-by-side boxplot of Phenotype Proportion of Fib1 against Fibrotic Zone for ICM ROIs.
ANOVA Summary Table for Fib1
Num. Par.
AIC
BIC
log Lik.
Deviance
Chi²
Df
Pr(<Chi²)
Null Model
3
-62.5
-57.2
34.2
-68.5
NA
NA
NA
Full Model
6
-60.6
-50.1
36.3
-72.6
4.2
3
0.2435
Fib2
Figure: Side-by-side boxplot of Phenotype Proportion of Fib2 against Fibrotic Zone for ICM ROIs.
ANOVA Summary Table for Fib2
Num. Par.
AIC
BIC
log Lik.
Deviance
Chi²
Df
Pr(<Chi²)
Null Model
3
-130.5
-125.8
68.3
-136.5
NA
NA
NA
Full Model
6
-127.8
-118.3
69.9
-139.8
3.3
3
0.3544
Fib3
Figure: Side-by-side boxplot of Phenotype Proportion of Fib3 against Fibrotic Zone for ICM ROIs.
ANOVA Summary Table for Fib3
Num. Par.
AIC
BIC
log Lik.
Deviance
Chi²
Df
Pr(<Chi²)
Null Model
3
-178.0
-174.5
92.0
-184.0
NA
NA
NA
Full Model
6
-180.7
-173.7
96.4
-192.7
8.7
3
0.0333
Fibrocyte2
Figure: Side-by-side boxplot of Phenotype Proportion of Fibrocyte2 against Fibrotic Zone for ICM ROIs.
ANOVA Summary Table for Fibrocyte2
Num. Par.
AIC
BIC
log Lik.
Deviance
Chi²
Df
Pr(<Chi²)
Null Model
3
-208.9
-204.0
107.4
-214.9
NA
NA
NA
Full Model
6
-207.2
-197.3
109.6
-219.2
4.3
3
0.2339
HypoEndo
Figure: Side-by-side boxplot of Phenotype Proportion of HypoEndo against Fibrotic Zone for ICM ROIs.
ANOVA Summary Table for HypoEndo
Num. Par.
AIC
BIC
log Lik.
Deviance
Chi²
Df
Pr(<Chi²)
Null Model
3
-171.8
-167.1
88.9
-177.8
NA
NA
NA
Full Model
6
-169.0
-159.5
90.5
-181.0
3.2
3
0.3639
HypoEndo1
Figure: Side-by-side boxplot of Phenotype Proportion of HypoEndo1 against Fibrotic Zone for ICM ROIs.
ANOVA Summary Table for HypoEndo1
Num. Par.
AIC
BIC
log Lik.
Deviance
Chi²
Df
Pr(<Chi²)
Null Model
3
-116.1
-111.1
61.1
-122.1
NA
NA
NA
Full Model
6
-113.5
-103.4
62.8
-125.5
3.4
3
0.3375
M2
Figure: Side-by-side boxplot of Phenotype Proportion of M2 against Fibrotic Zone for ICM ROIs.
ANOVA Summary Table for M2
Num. Par.
AIC
BIC
log Lik.
Deviance
Chi²
Df
Pr(<Chi²)
Null Model
3
-198.1
-192.8
102.1
-204.1
NA
NA
NA
Full Model
6
-208.9
-198.3
110.5
-220.9
16.8
3
8e-04
Neutrophil1
Figure: Side-by-side boxplot of Phenotype Proportion of Neutrophil1 against Fibrotic Zone for ICM ROIs.
ANOVA Summary Table for Neutrophil1
Num. Par.
AIC
BIC
log Lik.
Deviance
Chi²
Df
Pr(<Chi²)
Null Model
3
-254.2
-249.3
130.1
-260.2
NA
NA
NA
Full Model
6
-255.9
-246.3
134.0
-267.9
7.8
3
0.0505
Neutrophil2
Figure: Side-by-side boxplot of Phenotype Proportion of Neutrophil2 against Fibrotic Zone for ICM ROIs.
ANOVA Summary Table for Neutrophil2
Num. Par.
AIC
BIC
log Lik.
Deviance
Chi²
Df
Pr(<Chi²)
Null Model
3
-154.4
-150.4
80.2
-160.4
NA
NA
NA
Full Model
6
-150.8
-142.8
81.4
-162.8
2.4
3
0.4953
PopS100+
Figure: Side-by-side boxplot of Phenotype Proportion of PopS100+ against Fibrotic Zone for ICM ROIs.
ANOVA Summary Table for PopS100+
Num. Par.
AIC
BIC
log Lik.
Deviance
Chi²
Df
Pr(<Chi²)
Null Model
3
-179.8
-174.6
92.9
-185.8
NA
NA
NA
Full Model
6
-186.5
-176.0
99.2
-198.5
12.6
3
0.0055
PopSMA+Fx13a+
Figure: Side-by-side boxplot of Phenotype Proportion of PopSMA+Fx13a+ against Fibrotic Zone for ICM ROIs.
ANOVA Summary Table for PopSMA+Fx13a+
Num. Par.
AIC
BIC
log Lik.
Deviance
Chi²
Df
Pr(<Chi²)
Null Model
3
-230.0
-225.7
118.0
-236.0
NA
NA
NA
Full Model
6
-232.2
-223.6
122.1
-244.2
8.2
3
0.0428
ResMac
Figure: Side-by-side boxplot of Phenotype Proportion of ResMac against Fibrotic Zone for ICM ROIs.
ANOVA Summary Table for ResMac
Num. Par.
AIC
BIC
log Lik.
Deviance
Chi²
Df
Pr(<Chi²)
Null Model
3
-163.4
-158.1
84.7
-169.4
NA
NA
NA
Full Model
6
-175.0
-164.4
93.5
-187.0
17.6
3
5e-04
SMA1
Figure: Side-by-side boxplot of Phenotype Proportion of SMA1 against Fibrotic Zone for ICM ROIs.
ANOVA Summary Table for SMA1
Num. Par.
AIC
BIC
log Lik.
Deviance
Chi²
Df
Pr(<Chi²)
Null Model
3
-30.6
-25.3
18.3
-36.6
NA
NA
NA
Full Model
6
-49.0
-38.4
30.5
-61.0
24.4
3
0
Th
Figure: Side-by-side boxplot of Phenotype Proportion of Th against Fibrotic Zone for ICM ROIs.
ANOVA Summary Table for Th
Num. Par.
AIC
BIC
log Lik.
Deviance
Chi²
Df
Pr(<Chi²)
Null Model
3
-75.6
-72.7
40.8
-81.6
NA
NA
NA
Full Model
5
-74.8
-70.1
42.4
-84.8
3.3
2
0.1934
pDC
Figure: Side-by-side boxplot of Phenotype Proportion of pDC against Fibrotic Zone for ICM ROIs.
ANOVA Summary Table for pDC
Num. Par.
AIC
BIC
log Lik.
Deviance
Chi²
Df
Pr(<Chi²)
Null Model
3
-271.3
-267.2
138.7
-277.3
NA
NA
NA
Full Model
6
-267.4
-259.2
139.7
-279.4
2.1
3
0.5516
B
Figure: Side-by-side boxplot of Phenotype Proportion of B against Fibrotic Zone for ICM ROIs.
ANOVA Summary Table for B
Num. Par.
AIC
BIC
log Lik.
Deviance
Chi²
Df
Pr(<Chi²)
Null Model
3
-143.3
-140.8
74.6
-149.3
NA
NA
NA
Full Model
5
-142.8
-138.6
76.4
-152.8
3.5
2
0.1723
C2b
Figure: Side-by-side boxplot of Phenotype Proportion of C2b against Fibrotic Zone for ICM ROIs.
ANOVA Summary Table for C2b
Num. Par.
AIC
BIC
log Lik.
Deviance
Chi²
Df
Pr(<Chi²)
Null Model
3
-240.2
-235.4
123.1
-246.2
NA
NA
NA
Full Model
6
-240.8
-231.2
126.4
-252.8
6.6
3
0.086
Fibrocyte
Figure: Side-by-side boxplot of Phenotype Proportion of Fibrocyte against Fibrotic Zone for ICM ROIs.
ANOVA Summary Table for Fibrocyte
Num. Par.
AIC
BIC
log Lik.
Deviance
Chi²
Df
Pr(<Chi²)
Null Model
3
-90.2
-90.4
48.1
-96.2
NA
NA
NA
Full Model
5
-90.0
-90.3
50.0
-100.0
3.8
2
0.1498
PopCD45RO+HLADR+
Figure: Side-by-side boxplot of Phenotype Proportion of PopCD45RO+HLADR+ against Fibrotic Zone for ICM ROIs.
ANOVA Summary Table for PopCD45RO+HLADR+
Num. Par.
AIC
BIC
log Lik.
Deviance
Chi²
Df
Pr(<Chi²)
Null Model
3
-90.8
-89.6
48.4
-96.8
NA
NA
NA
Full Model
5
-91.0
-89.0
50.5
-101.0
4.2
2
0.1205
C4
Figure: Side-by-side boxplot of Phenotype Proportion of C4 against Fibrotic Zone for ICM ROIs.
ANOVA Summary Table for C4
Num. Par.
AIC
BIC
log Lik.
Deviance
Chi²
Df
Pr(<Chi²)
Null Model
3
-169.4
-167.1
87.7
-175.4
NA
NA
NA
Full Model
5
-181.7
-177.8
95.8
-191.7
16.3
2
3e-04
LymphEndo
Figure: Side-by-side boxplot of Phenotype Proportion of LymphEndo against Fibrotic Zone for ICM ROIs.
ANOVA Summary Table for LymphEndo
Num. Par.
AIC
BIC
log Lik.
Deviance
Chi²
Df
Pr(<Chi²)
Null Model
3
-141.0
-137.1
73.5
-147.0
NA
NA
NA
Full Model
6
-149.4
-141.6
80.7
-161.4
14.4
3
0.0025
MonMac
Figure: Side-by-side boxplot of Phenotype Proportion of MonMac against Fibrotic Zone for ICM ROIs.
ANOVA Summary Table for MonMac
Num. Par.
AIC
BIC
log Lik.
Deviance
Chi²
Df
Pr(<Chi²)
Null Model
3
-124.2
-121.2
65.1
-130.2
NA
NA
NA
Full Model
6
-121.9
-115.9
66.9
-133.9
3.7
3
0.2978
SMA3
Figure: Side-by-side boxplot of Phenotype Proportion of SMA3 against Fibrotic Zone for ICM ROIs.
ANOVA Summary Table for SMA3
Num. Par.
AIC
BIC
log Lik.
Deviance
Chi²
Df
Pr(<Chi²)
Null Model
3
-85.8
-84.1
45.9
-91.8
NA
NA
NA
Full Model
5
-90.1
-87.3
50.1
-100.1
8.4
2
0.0153
TReg
Figure: Side-by-side boxplot of Phenotype Proportion of TReg against Fibrotic Zone for ICM ROIs.
ANOVA Summary Table for TReg
Num. Par.
AIC
BIC
log Lik.
Deviance
Chi²
Df
Pr(<Chi²)
Null Model
3
-99.7
-98.2
52.8
-105.7
NA
NA
NA
Full Model
5
-100.3
-97.9
55.1
-110.3
4.6
2
0.0998
M1
Figure: Side-by-side boxplot of Phenotype Proportion of M1 against Fibrotic Zone for ICM ROIs.
ANOVA Summary Table for M1
Num. Par.
AIC
BIC
log Lik.
Deviance
Chi²
Df
Pr(<Chi²)
Null Model
3
-124.6
-123.1
65.3
-130.6
NA
NA
NA
Full Model
5
-124.7
-122.3
67.3
-134.7
4.1
2
0.1295
Activated
Th
Figure: Side-by-side boxplot of Phenotype Proportion of Activated against Fibrotic Zone for ICM ROIs.
Figure: Side-by-side boxplot of Phenotype Proportion of Th against Fibrotic Zone for ICM ROIs.
ANOVA Summary Table for Activated
ANOVA Summary Table for Th
Num. Par.
AIC
BIC
log Lik.
Deviance
Chi²
Df
Pr(<Chi²)
Null Model
3
-92.1
-91.5
49.1
-98.1
NA
NA
NA
Full Model
6
-86.9
-85.7
49.5
-98.9
0.8
3
0.8536
Mast
Figure: Side-by-side boxplot of Phenotype Proportion of Mast against Fibrotic Zone for ICM ROIs.
ANOVA Summary Table for Mast
Num. Par.
AIC
BIC
log Lik.
Deviance
Chi²
Df
Pr(<Chi²)
Null Model
3
-86.5
-86.3
46.3
-92.5
NA
NA
NA
Full Model
6
-97.7
-97.2
54.8
-109.7
17.2
3
7e-04
SMA2
Figure: Side-by-side boxplot of Phenotype Proportion of SMA2 against Fibrotic Zone for ICM ROIs.
ANOVA Summary Table for SMA2
Num. Par.
AIC
BIC
log Lik.
Deviance
Chi²
Df
Pr(<Chi²)
Null Model
3
-54.3
-52.4
30.1
-60.3
NA
NA
NA
Full Model
6
-50.8
-47.0
31.4
-62.8
2.5
3
0.4717
5.2 Key Goal 2 Analysis
Key Goal 2: Compare the proportion of cellular phenotypes present within different Fibrotic Zones between ICM and DCM patients.
Statistical Approach: For a given phenotype and a given Fibrotic Zone, use a Welch two-sample \(t\)-test to test if there is a significant difference in mean phenotype proportion between any ICM and DCM ROIs.
To account for the multiple tests being completed, a False Discovery Rate (FDR) p-value correction took place on the test results. This controls the FDR within each Phenotype Group (isolating each Fibrotic Zone). Therefore, a significant adjusted p-value means that mean phenotype proportion is significantly different between aetiologies for that Fibrotic Zone.
Statistical Assumptions: The assumptions of a Welch two-sample \(t\)-test include normality, independence and no outliers in each group. According to the Central Limit Theorem, we consider this data sufficiently normally distributed and do not believe there are sufficient outliers to impact the analysis results. In terms of independence, in this case, we consider separation within each phenotype and aetiology to be sufficient to ensure independence of results.
The results show seven significant results across two Fibrotic Zones (RF and IF) and four different phenotypes (M2, Fib3, LymphEndo and C3b). A significant result indicates that the relative mean phenotype proportion within that Fibrotic Zone is significantly different between the ICM and DCM Aetiologies. These significant results are shown in the table below, with all results available in the drop-down.
Code: Compute differences in phenotype proportion between ICM and DCM patients
between_disease_t =data.frame(matrix(ncol =4))colnames(between_disease_t) =c('Phenotype', 'Fibrotic Zone', 'Test Statistic', 'p-value')metaclusters =unique(cell_proportion_df$annotated_metacluster)total_tests =0dcm_df$aetiology ="DCM"icm_df$aetiology ="ICM"fibrotix_zone_list =c('Remote', 'IF', 'RF', 'RFC')all_roi_df =rbind(dcm_df, icm_df)# For each phenotype and fibrotic zone for (p in1:length(metaclusters)) { phenotype_df = all_roi_df |>filter(phenotype==metaclusters[p])for (i in1:length(fibrotix_zone_list)){ score_phenotype_df = phenotype_df |>filter(fibrotic_zone == fibrotix_zone_list[i]) score_phenotype_df_icm = score_phenotype_df |>filter(aetiology =='ICM') score_phenotype_df_dcm = score_phenotype_df |>filter(aetiology =='DCM')# Ensure that there is more than one observation in each aetiology before t test if ((dim(score_phenotype_df_icm)[[1]] >1) & (dim(score_phenotype_df_dcm)[[1]] >1)){ total_tests = total_tests +1 t_between =t.test(score_phenotype_df_icm$phenotype_proportion, score_phenotype_df_dcm$phenotype_proportion) between_disease_t[nrow(between_disease_t) +1, ] =c(metaclusters[p], fibrotix_zone_list[i], as.numeric(t_between$statistic), as.numeric(t_between$p.value)) } }}between_disease_t$`p-value`=as.numeric(between_disease_t$`p-value`)between_disease_t$`Test Statistic`=as.numeric(between_disease_t$`Test Statistic`)# FDR p-value adjustment (for all tests)between_disease_t$`Adj. p-value (FDR)`=p.adjust(between_disease_t$`p-value`, method ='fdr')between_disease_t =na.omit(between_disease_t)# Find FDR and Bonferroni adjusted p-values for each Phenotype groupbetween_disease_t = between_disease_t %>%group_by(Phenotype) %>%mutate(`Adj. p-value (FDR) (Within Fibrotic Zone)`=p.adjust(`p-value`, method ='fdr'), `Adj. p-value (Bonferroni) (Within Fibrotic Zone)`=p.adjust(`p-value`, method ='bonferroni')) %>%ungroup() between_disease_t$`Adj. p-value (FDR) (Within Fibrotic Zone)`=as.numeric(between_disease_t$`Adj. p-value (FDR) (Within Fibrotic Zone)`)between_disease_t = between_disease_t |>mutate(across(where(is.numeric), round, 4))# Only include significant (p-value < 0.05) values and order from smallest to largest sig_between_disease_t = between_disease_t |>filter(across(everything(), ~!is.na(.x) & .x !="")) |>arrange(`Adj. p-value (FDR) (Within Fibrotic Zone)`) |>filter(`Adj. p-value (FDR) (Within Fibrotic Zone)`<0.05)# Store results in list for tabset outputkg2list =list()for (z inunique(sig_between_disease_t$`Fibrotic Zone`)) { z_tab = sig_between_disease_t[sig_between_disease_t$`Fibrotic Zone`== z, ] kg2list[[z]] =list(table =kable(select(z_tab, -c('Adj. p-value (Bonferroni) (Within Fibrotic Zone)'))))}
Key Goal 3: Compare the relationship between phenotype abundance and fibrosis score within DCM and ICM patients.
Statistical Approach: For a given phenotype and a given Aetiology, fit an LMM regression between phenotype proportion and Fibrosis Score considering the random effects of each patient, to determine if this relationship is significant.
Before investigating the relationship between phenotype abundance and Fibrosis Score, the distribution of Fibrosis Scores within each Fibrotic Zone and Aetiology was explored.
We investigated Fibrosis Score distribution throughout the four Fibrosis Zones (Remote, IF, RF, RFC) both between and within each of the Aetiologies.
Code: Side-by-side boxplots for Fibroitc Score and Fibrotic Zone
patient_df |>ggplot() +aes(x = fibrotic_zone, y = fibrotic_score) +geom_boxplot() +# Individual data points to give indication of distributiongeom_jitter(size=1, colour=orange, alpha=0.5, height=0, width=0.1) +labs(x="Fibrotic Zone",y="Fibrosis Score",title="Summary of Fibrosis Score Distribution") +theme(plot.background =element_rect(fill ="#ffffff",linewidth =0),panel.border =element_rect(colour ="black", fill=NA),legend.box.background =element_rect(colour ="black"),axis.title =element_text(face="bold"), plot.title =element_text(face="bold", size =14, hjust =0.5)) +scale_y_continuous(labels = scales::percent) +facet_wrap(vars(aetiology))
Side-by-side boxplot of Fibrosis Score against Fibrotic Zone faceted by Aetiology.
Three sets of Welch two-sample \(t\)-tests were conducted. Within each set of tests, a Bonferroni Correction has been used to adjust the \(p\)-values.
Statistical Assumptions: As addressed above, the assumptions for a two-sample Welch \(t\)-Test include independence, no outliers and normality, all of which have been sufficiently met, see (Section 5.2).
Code: Pairwise Welch two-sample t-tests on fibrosis score in different fibrotic zones
dat_icm = patient_df[patient_df$aetiology =='ICM', ]dat_dcm = patient_df[patient_df$aetiology =='DCM', ]within_icm =data.frame(matrix(ncol =4))colnames(within_icm) =c('Fibrotic Zone 1', 'Fibrotic Zone 2', 'Test Statistic', 'p-value') within_dcm =data.frame(matrix(ncol =4))colnames(within_dcm) =c('Fibrotic Zone 1', 'Fibrotic Zone 2', 'Test Statistic', 'p-value') # Testing for significant differences in Fibrosis Score between# the 4 different fibrotic zonesfor (i in1:(length(fibrotix_zone_list)-1)){# Get the data from adjacent fibrotic zones (in terms of level of fibrosis) icm_lower = dat_icm[dat_icm$fibrotic_zone == fibrotix_zone_list[i], ] icm_upper = dat_icm[dat_icm$fibrotic_zone == fibrotix_zone_list[i+1], ]# Welch t-test t_icm =t.test(icm_lower$fibrotic_score, icm_upper$fibrotic_score)# Saves info to table within_icm[i,] =c(fibrotix_zone_list[i], fibrotix_zone_list[i+1], as.numeric(t_icm$statistic), as.numeric(t_icm$p.value))# Get the data from adjacent fibrotic zones (in terms of level of fibrosis) dcm_lower = dat_dcm[dat_dcm$fibrotic_zone == fibrotix_zone_list[i], ] dcm_upper = dat_dcm[dat_dcm$fibrotic_zone == fibrotix_zone_list[i+1], ]# Welch t-test t_dcm =t.test(dcm_lower$fibrotic_score, dcm_upper$fibrotic_score)# Saves info to table within_dcm[i,] =c(fibrotix_zone_list[i], fibrotix_zone_list[i+1], as.numeric(t_dcm$statistic), as.numeric(t_dcm$p.value))}within_icm$`p-value`=as.numeric(within_icm$`p-value`)within_icm$`Test Statistic`=as.numeric(within_icm$`Test Statistic`) |>round(2)within_dcm$`p-value`=as.numeric(within_dcm$`p-value`)within_dcm$`Test Statistic`=as.numeric(within_dcm$`Test Statistic`) |>round(2)# Applying Bonferroni Correctionwithin_icm = within_icm |>mutate(`Adj. p-value`=round(`p-value`*3,4))within_dcm = within_dcm |>mutate(`Adj. p-value`=round(`p-value`*3,4))
Code: Welch t-Test of fibrosis score in each fibrotic zone between ICM and DCM patients
between =data.frame(matrix(ncol =3))colnames(between) =c('Fibrotic Zone', 'Test Statistic', 'p-value') # Testing for significant differences in fibrosis Score between# Aetiology for the 4 different fibrotic zonesfor (i in1:(length(fibrotix_zone_list))){ icm_score = dat_icm[dat_icm$fibrotic_zone == fibrotix_zone_list[i], ] dcm_score = dat_dcm[dat_dcm$fibrotic_zone == fibrotix_zone_list[i], ]# Welch t-test t_between =t.test(icm_score$fibrotic_score, dcm_score$fibrotic_score)# Saves info to tabl between[i,] =c(fibrotix_zone_list[i], as.numeric(t_between$statistic), as.numeric(t_between$p.value))}between$`p-value`=as.numeric(between$`p-value`)between$`Test Statistic`=as.numeric(between$`Test Statistic`) |>round(2)# Applying Bonferroni Correctionbetween = between |>mutate(`Adj. p-value`=ifelse(`p-value`>0.25, 1,round(`p-value`*4,4)))
Welch t-Test between ICM and DCM ROIs summary table for each Fibroitc Zone.
Fibrotic Zone
Test Statistic
p-value
Adj. p-value
Remote
0.32
0.7563
1.0000
IF
0.39
0.7013
1.0000
RF
2.62
0.0154
0.0616
RFC
-2.16
0.0466
0.1862
Within the DCM Aetiology, the average Fibrosis Score increases between each of the Fibrotic Zones, and the difference between each zone is significant (as shown by \(t\)-test results).
Within the ICM Aetiology, average Fibrosis Scores increase from Remote to IF and IF to RF however there is not a significant difference in average Fibrosis Scores between RF and RFC Fibrotic Zones. This is important to consider in further analysis given that the Fibrotic Zone is used as an ordinal measure to categorise the Fibrotic Zone of a Region.
When comparing the average Fibrosis Score for a Fibrotic Zone, there is no significant difference between ICM and DCM patients.
To investigate the insignificant difference in mean between ICM’s RF and RFC Fibrotic Zones, a Two-Way ANOVA was conducted. This was testing whether there was an interaction effect between the disease Aetiology and Fibrotic Zone influencing Fibrosis Score. In other words, this was testing if the relationship between Fibrotic Zone and Fibrosis Score is significantly different in the two Aetiology groups. The test was blocked by Patient ID to reduce individual patient discrepancies (sometimes known as a repeated Two-Sample ANOVA).
Statistical Assumptions: The assumptions for Two-Way ANOVA are: include independence of variables, homoscedasticity and normality. Independence is sufficiently ensured in the use of patient ID as a blocking factor and normality is met by the Central Limit Theorem. Homoscedacity is sufficient for this analysis.
Code: Two-way ANOVA for Interaction Effect between Aetiology and Fibrotic Zone
With a \(p\)-value of 0.0196, this test showed a significant interaction effect. That is, the relationship between Fibrosis Score and Fibrotic Zone is different between ICM and DCM patients. This is demonstrated by the trace plot, showing that this change in the relationship between each Aetiology occurs between the RF and RFC Fibrotic Zones. This may point to the misclassification of some ROI zones, which can be investigated further.
To investigate the relationship between phenotype abundance and Fibrotic Score, a Linear Mixed Model (LMM) was fitted accounting for the random effects of each patient. The associated \(p\)-values of the model’s coefficients were used to calculate significance.
Statistical Assumptions: As for Key Goal 1 (Section 5.1), LMMs assume a linear relationship and independent, normally distributed error terms with constant variance. Again, Visual inspection of the data through exploratory analysis ensured these assumptions have been satisfied.
Code: Linear Mixed Model between phenotype abundance and fibrosis score
Within the ICM aetiology, the effect of Fibrosis Score on phenotype proportion was significant, and negative for C1, C2b, C3a, Fib1 and C2a. This effect was significant and positive for Neutrophil1.
NULL
Figure: Scatter Plot of Phenotype Proportion of Activated against Fibrosis Score for ICM ROIs.
Figure: Scatter Plot of Phenotype Proportion of Th against Fibrosis Score for ICM ROIs.
Within the DCM aetiology, the effect of fibrosis score on phenotype proportion was significant and negative for C1, C2a, C3a, Fib1, C2b, C4 and Mast. This effect was significant and positive for M2.
NULL
Figure: Scatter Plot of Phenotype Proportion of Activated against Fibrosis Score for ICM ROIs.
Figure: Scatter Plot of Phenotype Proportion of Th against Fibrosis Score for ICM ROIs.
In addressing Key Goal 1 (Section 3.1), investigation of phenotype proportion across Fibrotic Zones demonstrated a significant difference in mean phenotype proportion between at least one pair of Fibrotic Zones for each of the following phenotypes within each aetiology:
This reveals that phenotype proportion varies significantly between Fibrotic Zones (for at least one pair of zones) for each of the above phenotypes in their relative disease aetiology.
Regarding Key Goal 2 (Section 3.2), based on significant results from Key Goal 1 (Section 5.2) comparison of phenotype proportion within Fibrotic Zones between Aetiologies revealed significant differences in M2, LymphEndo, Fib3 and C3b in the IF Fibrotic Zone and Fib3, LymphEndo and C3b in the RF Fibrotic Zone.
For Key Goal 3 (Section 3.3), linear regression revealed the following significant linear relationships between phenotype abundance and Fibrosis Score within each Aetiology:
ICM:
Positive: Neutrophil1.
Negative: C1, C2a, C3a, Fib1 and C2b.
DCM:
Positive: M2.
Negative: C1, C2a, C3a, Fib1, C2b, C4 and Mast.
Therefore comparatively, within both ICM and DCM, we see negative linear relationships between phenotype abundance and Fibrosis Score within C1, C2a, C3a, Fib1 and C2b.
For further analysis and investigation, additional code details relating to all results are in the appendix.
7 Appendix
Exploratory Data Analysis
The following column graph demonstrates cell proportion within each disease Aetiology for each Fibrotic Zone and Cellular Phenotype. From the graph highlights that:
across all zones and both aetiologies, the proportion of the Fib1 phenotype is high.
in both aetiologies as Fibrotic Zone increases, the proportion of:
SMA1 decreases.
B1 increases.
The graph also details the number of samples within each category, to highlight the differences in sample numbers based on individual patient data. Evidently samples are most abundant in the IF Fibrotic Zone for ICM and DCM, as well as the RF Zone for ICM.
---title: "SCDL3926 Project 1"subtitle: "A Statistical Analysis of Biomedical CVD Samples"author: - "**SIDs**: XX" - "**Client**: XX"title-block-banner: "#d85f33"date: "`r format(Sys.time(), '%d %B, %Y %H:%M')`"format: html: self_contained: true theme: - theme/style.scss - united embed-resources: true code-fold: true code-tools: truetable-of-contents: truenumber-sections: truepage-layout: fullsidebar-width: 0pxfig-align: centercss: theme/style.csseditor: markdown: wrap: 72---```{r message=FALSE, warning = FALSE}#| code-summary: "Code: Setup"library(tidyverse)library(kableExtra)library(rstatix)library(ggpubr)library(knitr)library(broom)library(DT)library(janitor)library(emmeans)library(tippy)library(ggfortify)library(lmerTest)orange = "#d85f33"lightorange = "#fcbaa2"tippy::tippy_this(elementId = "power", tooltip = "The ability for a statistical test to detect a significant result, given the significant result actually exists.")tippy::tippy_this(elementId = "indep", tooltip = "The values of one datapoint (ROI) does not depend on any other datapoint (any other ROI).")tippy::tippy_this(elementId = "homoskedasticity", tooltip = "The variance of an error term or residual is independent.")```# Executive SummaryThe client presents a study investigating cellular phenotypes present intwo cardiovascular disease Aetiologies, Dilated Cardiomyopathy (DCM) andIschemic Cardiomyopathy (ICM). Based on a sample of cellular markingresults from 16 patients (8 within each Aetiology) and a conversationwith the client, three key goals were identified. Using the proportionof phenotype cells in a given ROI, this statistical investigation found thefollowing results for each goal:1. Compare cellular phenotypes present with different Fibrotic Zones **within** ICM patients and **within** DCM patients. Within each Aetiology, the following cellular phenotypes demonstrated a significant difference in phenotype proportion between Fibrotic zones: - **ICM**: C1, C2b, C3a, CD45RO, Fib2, Fib3, Fibrocyte2, M2, PopSMA+Fx13a+, ResMac, SMA1, SMA3, C2a, Activated Th and SMA2 - **DCM**: C1, C2a, CD45RO, Fib3, M2, PopS100+, PopSMA+Fx13a+, ResMac, SMA1, C4, LymphEndo, SMA3 and Mast2. Compare cellular phenotypes present within different Fibrotic Zones **between** ICM and DCM patients. For the IF Fibrotic Zone, mean phenotype proportion was significantly different between ICM and DCM samples within M2, LymphEndo, Fib3 and C3b phenotypes. In the RF Fibrotic Zone, average phenotype proportion was significantly different for Fib3. LymphEndo and C3b phenotypes.3. Compare the relationship between phenotype abundance and Fibrosis score within ICM and DCM patients. The relationship between phenotype proportion and Fibrosis Score was significant for the following phenotypes within each aetiology: - **ICM**: Positive Relationship: C1, C2b, C3a, Fib1 and C2a; Negative Relationship: Neutrophil1 - **DCM**: Positive Relationship: C1, C2a, C3a, Fib1, C2b, C4 and Mast; Negative Relationship: M2# BackgroundCardiovascular disease (CVD) is a leading cause of death and disabilityworldwide. To diagnose and combat CVD, the client has designed a38-parameter panel of antibodies for single-cell expression profilingand spatial mapping of the myocardial microenvironment. This process wasaided by machine learning tools which have been optimised to thetopology of the human heart. The data the client has collected has beensourced from 16 total patients, and was collected from multiplephysiological zones within the myocardium. These patients all haveexperienced heart failure, derived from either Dilated Cardiomyopathy(DCM) or Ischemic Cardiomyopathy (ICM).ICM involves left ventricle dilation caused by vessel disease, while DCMis left ventricle dilation caused by a non-vessel disease, and isinfluenced by genetics and excessive alcohol consumption. Collagenwithin the heart is known as scar tissue, while myocardium representshealthy muscle tissue. This distinction is vital in determining theFibrotic Zone of a tissue sample. Fibrotic Zone will be addressedfurther in depth in the following sections. Four different FibroticZones have been identified; Remote, IF, RF and RFC. These correspond toan increasing abundance of fibrotic tissue with a sample.# Goals of AnalysisThrough analysis of the client's background information, the clientconsultation, and the client's data, the consulting team identified 3key goals:## Key Goal 1 {#sec-kg1}Compare the proportion of cellular phenotypes present with differentFibrotic Zones **within** ICM patients and **within** DCM patients.## Key Goal 2 {#sec-kg2}Compare the proportion of cellular phenotypes present within differentFibrotic Zones **between** ICM and DCM patients.## Key Goal 3 {#sec-kg3}Compare the relationship between phenotype abundance and fibrosis scorewithin DCM and ICM patients.# Datasets and Considerations {#sec-data}## DatasetsThe datasets provided by the client represent heart samples from 16patients, 8 with ICM and 8 with DCM. The data was compiled by our clientusing machine learning techniques catered to her research goals. Thereare two datasets utilized within this report: a set referencing sampleand area information and a set compiling data on single cells.The dataset compiling sample and area information is composed of 8columns:<details><summary>ROI Data Column Summary</summary>- **ROI**: Region of interest- **Sample**: Specifies the patient the sample is pulled from- **Group**: Denotes the zone of the sample, as well as the sample's disease- **Batch**: Denotes the batch associated with the sample- **Aetiology**: Whether the sample is associated with a patient with ICM or DCM- **Scar**: Area of scarring- **Myocardium**: Area of myocardium- **Background**: Background area due to imaging gaps, which is ignored in the calculation of the area. Throughout this report, area refers to the sum of the scar area and the myocardium area, with the fibrosis score being the percentage of scar area compared to the total (non-background) area.</details>The dataset compiling data on single cells has 57 columns:<details><summary>Single Cell Data Column Summary</summary>- **ROI**: Region of Interest- **ID**: The ID number assigned to the cell- **X**: The x-coordinates of the pixel, with a pixel measured in a one micromillimeter by one micromillimeter micron- **Y**: The y-coordinates of of the pixel, with a pixel measured in a one micromillimeter by one micromillimeter micron- **Area**: Area of the pixel- **Cell Type**: A numerical organisation of cells, which is not relevant to the analysis- **Regions**: A numerical organisation of regions, which is not relevant to the analysis- **Annotated Cell Type**: Refers to the type of cell the sample is pulled from, all of which are cells- **Annotated Region**: Whether the single cell is a myocardium cell or a scar cell- **Markers**: Percentage coverage for each type of biological marker within a pixel- **Annotated Metacluster**: Labeling the various cellular phenotypes- **Sample**: Specifies the patient the sample is pulled from- **Group**: Denotes the zone of the sample, as well as the sample's disease- **Batch**: Denotes the batch associated with the sample- **Aetiology**: Whether the sample is associated with a patient with ICM or DCM</details>## Considerations {#sec-considerations}One significant consideration that we had to navigate during this reportwas the relatively small sample size. There are 16 patients in thisdataset, with 92 Regions of Interest in total. Distributing this out perpatient, each patient would have approximately 6 ROIs per patient.Initially, we determined this would be too small a sample size to make a mixedmodel, which would have required splitting the given data into 16groups, with one group for each patient. We believed that if we had created a mixedmodel, the [[power]{style="text-decoration: underline;"}]{id='power'} of the testswould have been too low, resulting in a statistical test that could noteffectively detect significant results. However, after consulting with our academic advisor, he assured us that the use of a mixed linear model was appropriate in addressing Key Goals 1 and 3 (@sec-kg1 and @sec-kg3). Another consideration was the issue of[[independence]{style="text-decoration: underline;"}]{id='indep'}, which is anassumption of many statistical tests. It can be argued that some ROIsare not independent of each other if they come from the same patient.One way to overcome this independence is to use a mixed model, which we applied for Key Goals 1 and 3.Normality is another assumption of a $t$-test. After a visual assessment of the data, we found that there was not enough evidence to conclude that the assumption of normality is violated. We wereconcerned about potential issues with[[homoskedasticity]{style="text-decoration: underline;"}]{id='homoskedasticity'}. However, afterconsulting with our academic lead, we determined that this issue does not affect the results.We did notice some high leverage points in Neutrophil1 under (@sec-kg3analysis). Removing these does alter the results. Therefore these points should be put under careful consideration when interpreting these results. # Analysis```{r message=FALSE, warning=FALSE}#| code-summary: "Code: Initial Data Cleaning and Transformation"patient_df = read_csv('data/ROI.GROUP.AREA.data.csv') |> clean_names()# Fibrosis Score Calculationpatient_df = patient_df |> mutate(fibrotic_score = scar/(scar + myocardium)) |> mutate(fibrotic_zone = str_extract(group, "^[^_]*")) |> mutate(fibrotic_zone = factor(fibrotic_zone, levels=c('Remote', "IF", 'RF', "RFC"))) |> mutate(patient_id = str_extract(sample, "[^_]+$")) |> rename(roi=roi_1)# Mapping Table between Patients, ROIs and Fibrotic Zonepatient_id_roi_fibrotic_zone_mapping_df = patient_df |> select(patient_id, roi, fibrotic_zone) |> unique()cell_df = read_csv('data/HFCellPop.csv') |> clean_names()# Cell counts in each ROIroi_total_cells_count_df = cell_df |> select(roi) |> group_by(roi) |> summarise(roi_cell_count = n()) |> ungroup()# Proportion of each Phenotype in each ROIcell_proportion_df = cell_df |> select(roi, annotated_metacluster, sample, group, aetiology) |> group_by(roi, annotated_metacluster, sample, group, aetiology) |> summarise(phenotype_cell_count = n()) |> ungroup() |> mutate(fibrotic_zone = str_extract(group, "^[^_]*")) |> mutate(fibrotic_zone = factor(fibrotic_zone, levels=c('Remote', "IF", 'RF', "RFC"))) |> mutate(patient_id = str_extract(sample, "[^_]+$")) |> left_join(roi_total_cells_count_df) |> mutate(phenotype_cell_proportion = phenotype_cell_count/roi_cell_count)# Simplified dataframe of cell score and cell proportion for ROI proportion_score_df = cell_proportion_df |> left_join(patient_df |> select(roi, fibrotic_score, fibrotic_zone))# ICM only Dataframe icm_disease_cell_proportion_df = cell_proportion_df |> filter(aetiology == 'ICM')annotated_metaclusters = cell_df |> select(annotated_metacluster) |> unique()rois = icm_disease_cell_proportion_df |> select(roi) |> unique()icm_df = data.frame(roi=c(), phenotype=c(), phenotype_proportion=c())# One row per ROI+phenotype pair, with value of phenotype proportionfor (phenotype_loop in as.list(annotated_metaclusters$annotated_metacluster)) { for (roi_loop in as.list(rois$roi)) { df = icm_disease_cell_proportion_df |> filter(annotated_metacluster == phenotype_loop, roi == roi_loop) |> select(phenotype_cell_proportion) phenotype_proportion_for_roi = df$phenotype_cell_proportion[1] icm_df = rbind(icm_df, data.frame(roi=c(roi_loop), phenotype=c(phenotype_loop), phenotype_proportion=c(phenotype_proportion_for_roi))) }}# Fills in empty rows with 0 (phenotype not present in ROI)icm_df = icm_df |> replace_na(list(phenotype_proportion=0))# Adds in patient and Fibrotic Zone infoicm_df = icm_df |> left_join(patient_id_roi_fibrotic_zone_mapping_df)# DCM only Dataframedcm_disease_cell_proportion_df = cell_proportion_df |> filter(aetiology == 'DCM')rois = dcm_disease_cell_proportion_df |> select(roi) |> unique()dcm_df = data.frame(roi=c(), phenotype=c(), phenotype_proportion=c())# One row per ROI+phenotype pair, with value of phenotype proportionfor (phenotype_loop in as.list(annotated_metaclusters$annotated_metacluster)) { for (roi_loop in as.list(rois$roi)) { df = dcm_disease_cell_proportion_df |> filter(annotated_metacluster == phenotype_loop, roi == roi_loop) |> select(phenotype_cell_proportion) dcm_df = rbind(dcm_df, data.frame(roi=c(roi_loop), phenotype=c(phenotype_loop), phenotype_proportion=c(df$phenotype_cell_proportion[1]))) }}# Fills in empty rows with 0 (phenotype not present in ROI)dcm_df = dcm_df |> replace_na(list(phenotype_proportion=0))# Adds in patient and Fibrotic Zone infodcm_df = dcm_df |> left_join(patient_id_roi_fibrotic_zone_mapping_df)```To meet the key goals of this report, we investigated the proportion ofcells that a particular phenotype took up within an ROI. If a phenotypewas not present in an ROI, it was taken that this phenotype took up$0\%$ of the ROI. Phenotype proportion within an ROI was defined asfollows:$$\text{Phenotype Proportion} = \frac{\text{\# Cells of Given Phenotype}}{\text{\# Cells in the ROI}}$$For Key Goal 3, We also investigated Fibrosis Scores for ROIs. This isdefined as the scarred area divided by the total cell area (notincluding background from the cellular marker), as below:$$\text{Fibrosis Score} = \frac{\text{Scarred area in cell}}{\text{Total cell area}}$$## Key Goal 1 Analysis {#sec-kg1analysis}**Key Goal 1**: Compare the proportion of cellular phenotypes presentwith different Fibrotic Zones **within** ICM patients and within DCMpatients.**Statistical Approach**: For a given phenotype and a given Aetiology,create two Linear Mixed Models (LMM) to predict Phenotype Proportion.Each model includes the random effect of each patient, to account forthe fact that each observation is not independent. One of these models,the 'full model' includes the effect of Fibrotic Zone:$$\text{Phenotype Proportion} \sim 1 + \text{Fibrotic Zone} + (1|\text{Patient ID})$$The other model, the 'null model', does not include the effect ofFibrotic Zone:$$\text{Phenotype Proportion} \sim 1 + (1|\text{Patient ID})$$These two LMMs are compared using ANOVA to determine whether the effectof Fibrotic Zone is significant on the Phenotype Proportion byinvestigating the difference between the two models. If a result issignificant, this means that Phenotype Proportion differs significantlyacross the Fibrotic Zones within that aetiology. This has beenvisualised below with a boxplot.**Statistical Assumptions**: For an LMM, it is assumed that the data islinearly related and error terms are independent, normally distributedand have constant variance. Visual inspection of the data throughexploratory analysis ensured these assumptions were satisfied. Inthe ANOVA of these two LMMs, the assumptions of independence, normalityand homogeneity of variance are met.```{r message=F, warning=F}#| code-summary: "Code: Linear Mixed Model ANOVA of Phenotype Abundance throughout Fibrotic Zones per Phenotype"kg1_res_ls = list() #list for all resultskg1_sig_res_ls = list() #list for significant results # For each phenotype within each aetiology (ICM or DCM)for (aetiolog in unique(cell_proportion_df$aetiology)){ aetiology_df = proportion_score_df[cell_proportion_df$aetiology == aetiolog, ] for (phenotype in unique(aetiology_df$annotated_metacluster)){ phenotype_df = aetiology_df[aetiology_df$annotated_metacluster == phenotype, ] if (nrow(phenotype_df) > 1){ #if more than one observation for phenotype and aetiology m_full = lmer(phenotype_cell_proportion ~ 1 + fibrotic_zone + (1|patient_id), data = phenotype_df) m_null = lmer(phenotype_cell_proportion ~ 1 + (1|patient_id), data = phenotype_df) a = anova(m_null, m_full) #compare null (not including fibrotic zone) and full (including fibrotic zone) models by ANOVA # Store ANOVA results as dataframe res = a |> as.data.frame() # Side-by-side boxplot p = phenotype_df |> ggplot() + aes(x=fibrotic_zone, y=phenotype_cell_proportion) + geom_boxplot() + # Individual Data Points geom_jitter(size=1, colour=orange, alpha=0.5, height=0, width=0.1) + labs(title=paste0(phenotype)) + labs(x="Fibrotic Zone", y="Phenotype Proportion") + scale_y_continuous(labels = scales::percent) + theme(plot.background = element_rect(fill = "#ffffff", linewidth = 0), panel.border = element_rect(colour = "black", fill=NA), legend.box.background = element_rect(colour = "black"), axis.title = element_text(face="bold"), plot.title = element_text(face="bold", size = 14, hjust = 0.5)) rownames(res) = c("Null Model", "Full Model") colnames(res) = c("Num. Par.", "AIC", "BIC", "log Lik.", "Deviance", "Chi²", "Df", "Pr(<Chi²)") # Store ANOVA results table and boxplot in list labelled under Phenotype name kg1_res_ls[[paste(aetiolog, phenotype)]] = list(table = kable(res, digits = c(0,1,1,1,1,1,0,4)), plot = p) # Add ANOVA results table and boxplot to list for significant results if ANOVA is significant if (res$`Pr(<Chi²)` [2] < 0.05){ kg1_sig_res_ls[[paste(aetiolog, phenotype)]] = list(table = kable(res, digits = c(0,1,1,1,1,2,0,4)), plot = p) } } }}kg1_res_ls_icm = kg1_res_ls[grep("ICM", names(kg1_res_ls))] # ICM results list kg1_res_ls_dcm = kg1_res_ls[grep("DCM", names(kg1_res_ls))] # DCM results list kg1_sig_res_ls_icm = kg1_sig_res_ls[grep("ICM", names(kg1_sig_res_ls))] # ICM significant results list kg1_sig_res_ls_dcm = kg1_sig_res_ls[grep("DCM", names(kg1_sig_res_ls))] # DCM significant results list ```### Within ICMWithin the ICM Aetiology, the effect of Fibrotic Zone on PhenotypeProportion was significant in the C1, C2b, C3a, CD45RO, Fib2, Fib3,Fibrocyte2, M2, PopSMA+Fx13a+, ResMac, SMA1, SMA3, C2a, Activated Th andSMA2 cellular phenotypes.These significant results are outputted below. Results for allphenotypes can be found under the 'All Results' tab below.::: panel-tabset```{r}#| results: asis# Outputs the info for significant phenotypes# This gets outputted as raw markdown so the tabs can be made# .y ~ phenotype# .x ~ list with the info attached to the phenotypeiwalk(kg1_sig_res_ls_icm, ~ { name =strsplit(.y, split =" ")[[1]][-1]cat('## ', name, '\n\n')cat(paste0("<h3>", name, "</h3>"))cat("\n\n")print(.x$plot)cat(paste0('<figcaption class="figure-caption">Figure: Side-by-side boxplot of Phenotype Proportion of ', name, ' against Fibrotic Zone for ICM ROIs.</figcaption> <br>'))cat("\n\n")cat(paste0("<center>**ANOVA Summary Table for ", name,"**</center>"))cat("\n\n")print(.x$table)cat('\n\n')})```:::<details><summary>All results</summary>::: panel-tabset```{r}#| results: asis# Outputs the info for significant phenotypes# This gets outputted as raw markdown so the tabs can be made# .y ~ phenotype# .x ~ list with the info attached to the phenotypeiwalk(kg1_res_ls_icm, ~ { name =strsplit(.y, split =" ")[[1]][-1]cat('## ', name, '\n\n')cat(paste0("<h3>", name, "</h3>"))cat("\n\n")print(.x$plot)cat(paste0('<figcaption class="figure-caption">Figure: Side-by-side boxplot of Phenotype Proportion of ', name, ' against Fibrotic Zone for ICM ROIs.</figcaption> <br>'))cat("\n\n")cat(paste0("<center>**ANOVA Summary Table for ", name,"**</center>"))cat("\n\n")print(.x$table)cat('\n\n')})```:::</details>### Within DCMWithin the DCM Aetiology, the effect of Fibrotic Zone on PhenotypeProportion was significant in the C1, C2a, CD45RO, Fib3, M2, PopS100+,PopSMA+Fx13a+, ResMac, SMA1, C4, LymphEndo, SMA3 and Mast cellularphenotypes.These significant results are outputted below. Results for allphenotypes can be found under the 'All Results' tab below.::: panel-tabset```{r message=F, warning=F}#| results: asis# Outputs the info for significant phenotypes# This gets outputted as raw markdown so the tabs can be made# .y ~ phenotype# .x ~ list with the info attached to the phenotypeiwalk(kg1_sig_res_ls_dcm, ~ { name = strsplit(.y, split = " ")[[1]][-1] cat('## ', name, '\n\n') cat(paste0("<h3>", name, "</h3>")) cat("\n\n") print(.x$plot) cat(paste0('<figcaption class="figure-caption">Figure: Side-by-side boxplot of Phenotype Proportion of ', name, ' against Fibrotic Zone for ICM ROIs.</figcaption> <br>')) cat("\n\n") cat(paste0("<center>**ANOVA Summary Table for ", name,"**</center>")) cat("\n\n") print(.x$table) cat('\n\n')})```:::<details><summary>All results</summary>::: panel-tabset```{r}#| results: asis# Outputs the info for significant phenotypes# This gets outputted as raw markdown so the tabs can be made# .y ~ phenotype# .x ~ list with the info attached to the phenotypeiwalk(kg1_res_ls_dcm, ~ { name =strsplit(.y, split =" ")[[1]][-1]cat('## ', name, '\n\n')cat(paste0("<h3>", name, "</h3>"))cat("\n\n")print(.x$plot)cat(paste0('<figcaption class="figure-caption">Figure: Side-by-side boxplot of Phenotype Proportion of ', name, ' against Fibrotic Zone for ICM ROIs.</figcaption> <br>'))cat("\n\n")cat(paste0("<center>**ANOVA Summary Table for ", name,"**</center>"))cat("\n\n")print(.x$table)cat('\n\n')})```:::</details>## Key Goal 2 Analysis {#sec-kg2analysis}**Key Goal 2**: Compare the proportion of cellular phenotypes presentwithin different Fibrotic Zones **between** ICM and DCM patients.**Statistical Approach**: For a given phenotype and a given FibroticZone, use a Welch two-sample $t$-test to test if there is a significantdifference in mean phenotype proportion between any ICM and DCM ROIs.To account for the multiple tests being completed, a False DiscoveryRate (FDR) p-value correction took place on the test results. Thiscontrols the FDR within each Phenotype Group (isolating each FibroticZone). Therefore, a significant adjusted p-value means that meanphenotype proportion is significantly different between aetiologies forthat Fibrotic Zone.**Statistical Assumptions**: The assumptions of a Welch two-sample$t$-test include normality, independence and no outliers in each group.According to the Central Limit Theorem, we consider this datasufficiently normally distributed and do not believe there aresufficient outliers to impact the analysis results. In terms ofindependence, in this case, we consider separation within each phenotypeand aetiology to be sufficient to ensure independence of results.The results show seven significant results across two Fibrotic Zones (RFand IF) and four different phenotypes (M2, Fib3, LymphEndo and C3b). Asignificant result indicates that the relative mean phenotype proportionwithin that Fibrotic Zone is significantly different between the ICM andDCM Aetiologies. These significant results are shown in the table below,with all results available in the drop-down.```{r warning=FALSE}#| code-summary: "Code: Compute differences in phenotype proportion between ICM and DCM patients"between_disease_t = data.frame(matrix(ncol = 4))colnames(between_disease_t) = c('Phenotype', 'Fibrotic Zone', 'Test Statistic', 'p-value')metaclusters = unique(cell_proportion_df$annotated_metacluster)total_tests = 0dcm_df$aetiology = "DCM"icm_df$aetiology = "ICM"fibrotix_zone_list = c('Remote', 'IF', 'RF', 'RFC')all_roi_df = rbind(dcm_df, icm_df)# For each phenotype and fibrotic zone for (p in 1:length(metaclusters)) { phenotype_df = all_roi_df |> filter(phenotype==metaclusters[p]) for (i in 1:length(fibrotix_zone_list)){ score_phenotype_df = phenotype_df |> filter(fibrotic_zone == fibrotix_zone_list[i]) score_phenotype_df_icm = score_phenotype_df |> filter(aetiology == 'ICM') score_phenotype_df_dcm = score_phenotype_df |> filter(aetiology == 'DCM') # Ensure that there is more than one observation in each aetiology before t test if ((dim(score_phenotype_df_icm)[[1]] > 1) & (dim(score_phenotype_df_dcm)[[1]] > 1)){ total_tests = total_tests + 1 t_between = t.test(score_phenotype_df_icm$phenotype_proportion, score_phenotype_df_dcm$phenotype_proportion) between_disease_t[nrow(between_disease_t) + 1, ] = c(metaclusters[p], fibrotix_zone_list[i], as.numeric(t_between$statistic), as.numeric(t_between$p.value)) } }}between_disease_t$`p-value` = as.numeric(between_disease_t$`p-value`)between_disease_t$`Test Statistic` = as.numeric(between_disease_t$`Test Statistic`)# FDR p-value adjustment (for all tests)between_disease_t$`Adj. p-value (FDR)` = p.adjust(between_disease_t$`p-value`, method = 'fdr')between_disease_t = na.omit(between_disease_t)# Find FDR and Bonferroni adjusted p-values for each Phenotype groupbetween_disease_t = between_disease_t %>% group_by(Phenotype) %>% mutate(`Adj. p-value (FDR) (Within Fibrotic Zone)` = p.adjust(`p-value`, method = 'fdr'), `Adj. p-value (Bonferroni) (Within Fibrotic Zone)` = p.adjust(`p-value`, method = 'bonferroni')) %>% ungroup() between_disease_t$`Adj. p-value (FDR) (Within Fibrotic Zone)` = as.numeric(between_disease_t$`Adj. p-value (FDR) (Within Fibrotic Zone)`)between_disease_t = between_disease_t |> mutate(across(where(is.numeric), round, 4))# Only include significant (p-value < 0.05) values and order from smallest to largest sig_between_disease_t = between_disease_t |> filter(across(everything(), ~!is.na(.x) & .x != "")) |> arrange(`Adj. p-value (FDR) (Within Fibrotic Zone)`) |> filter(`Adj. p-value (FDR) (Within Fibrotic Zone)` < 0.05)# Store results in list for tabset outputkg2list = list()for (z in unique(sig_between_disease_t$`Fibrotic Zone`)) { z_tab = sig_between_disease_t[sig_between_disease_t$`Fibrotic Zone` == z, ] kg2list[[z]] = list(table = kable(select(z_tab, -c('Adj. p-value (Bonferroni) (Within Fibrotic Zone)'))))}```::: panel-tabset```{r}#| results: asis# Outputs the info for significant differences# This gets outputted as raw markdown so the tabs can be made# .y ~ phenotype# .x ~ list with the info attached to the phenotypeiwalk(kg2list, ~ {cat('## ', .y, '\n\n')cat(paste0("<h3>", .y, "</h3>"))cat("\n\n")print(.x$table)cat('\n\n')})```:::<details><summary>All results</summary>```{r message = F, warning = F}datatable(between_disease_t |> drop_na() |> arrange(`Adj. p-value (FDR) (Within Fibrotic Zone)`))```</details>## Key Goal 3 Analysis {#sec-kg3analysis}**Key Goal 3**: Compare the relationship between phenotype abundance andfibrosis score within DCM and ICM patients.**Statistical Approach**: For a given phenotype and a given Aetiology,fit an LMM regression between phenotype proportion and Fibrosis Scoreconsidering the random effects of each patient, to determine if thisrelationship is significant.Before investigating the relationship between phenotype abundance andFibrosis Score, the distribution of Fibrosis Scores within each FibroticZone and Aetiology was explored.We investigated Fibrosis Score distribution throughout the four FibrosisZones (Remote, IF, RF, RFC) both between and within each of theAetiologies.```{r}#| code-summary: "Code: Side-by-side boxplots for Fibroitc Score and Fibrotic Zone"#| fig-cap: "Side-by-side boxplot of Fibrosis Score against Fibrotic Zone faceted by Aetiology."patient_df |>ggplot() +aes(x = fibrotic_zone, y = fibrotic_score) +geom_boxplot() +# Individual data points to give indication of distributiongeom_jitter(size=1, colour=orange, alpha=0.5, height=0, width=0.1) +labs(x="Fibrotic Zone",y="Fibrosis Score",title="Summary of Fibrosis Score Distribution") +theme(plot.background =element_rect(fill ="#ffffff",linewidth =0),panel.border =element_rect(colour ="black", fill=NA),legend.box.background =element_rect(colour ="black"),axis.title =element_text(face="bold"), plot.title =element_text(face="bold", size =14, hjust =0.5)) +scale_y_continuous(labels = scales::percent) +facet_wrap(vars(aetiology))```Three sets of Welch two-sample $t$-tests were conducted. Within each setof tests, a Bonferroni Correction has been used to adjust the$p$-values.**Statistical Assumptions**: As addressed above, the assumptions for atwo-sample Welch $t$-Test include independence, no outliers andnormality, all of which have been sufficiently met, see(@sec-kg2analysis).```{r}#| code-summary: "Code: Pairwise Welch two-sample t-tests on fibrosis score in different fibrotic zones"dat_icm = patient_df[patient_df$aetiology =='ICM', ]dat_dcm = patient_df[patient_df$aetiology =='DCM', ]within_icm =data.frame(matrix(ncol =4))colnames(within_icm) =c('Fibrotic Zone 1', 'Fibrotic Zone 2', 'Test Statistic', 'p-value') within_dcm =data.frame(matrix(ncol =4))colnames(within_dcm) =c('Fibrotic Zone 1', 'Fibrotic Zone 2', 'Test Statistic', 'p-value') # Testing for significant differences in Fibrosis Score between# the 4 different fibrotic zonesfor (i in1:(length(fibrotix_zone_list)-1)){# Get the data from adjacent fibrotic zones (in terms of level of fibrosis) icm_lower = dat_icm[dat_icm$fibrotic_zone == fibrotix_zone_list[i], ] icm_upper = dat_icm[dat_icm$fibrotic_zone == fibrotix_zone_list[i+1], ]# Welch t-test t_icm =t.test(icm_lower$fibrotic_score, icm_upper$fibrotic_score)# Saves info to table within_icm[i,] =c(fibrotix_zone_list[i], fibrotix_zone_list[i+1], as.numeric(t_icm$statistic), as.numeric(t_icm$p.value))# Get the data from adjacent fibrotic zones (in terms of level of fibrosis) dcm_lower = dat_dcm[dat_dcm$fibrotic_zone == fibrotix_zone_list[i], ] dcm_upper = dat_dcm[dat_dcm$fibrotic_zone == fibrotix_zone_list[i+1], ]# Welch t-test t_dcm =t.test(dcm_lower$fibrotic_score, dcm_upper$fibrotic_score)# Saves info to table within_dcm[i,] =c(fibrotix_zone_list[i], fibrotix_zone_list[i+1], as.numeric(t_dcm$statistic), as.numeric(t_dcm$p.value))}within_icm$`p-value`=as.numeric(within_icm$`p-value`)within_icm$`Test Statistic`=as.numeric(within_icm$`Test Statistic`) |>round(2)within_dcm$`p-value`=as.numeric(within_dcm$`p-value`)within_dcm$`Test Statistic`=as.numeric(within_dcm$`Test Statistic`) |>round(2)# Applying Bonferroni Correctionwithin_icm = within_icm |>mutate(`Adj. p-value`=round(`p-value`*3,4))within_dcm = within_dcm |>mutate(`Adj. p-value`=round(`p-value`*3,4))``````{r}#| code-summary: "Code: Welch t-Test of fibrosis score in each fibrotic zone between ICM and DCM patients"between =data.frame(matrix(ncol =3))colnames(between) =c('Fibrotic Zone', 'Test Statistic', 'p-value') # Testing for significant differences in fibrosis Score between# Aetiology for the 4 different fibrotic zonesfor (i in1:(length(fibrotix_zone_list))){ icm_score = dat_icm[dat_icm$fibrotic_zone == fibrotix_zone_list[i], ] dcm_score = dat_dcm[dat_dcm$fibrotic_zone == fibrotix_zone_list[i], ]# Welch t-test t_between =t.test(icm_score$fibrotic_score, dcm_score$fibrotic_score)# Saves info to tabl between[i,] =c(fibrotix_zone_list[i], as.numeric(t_between$statistic), as.numeric(t_between$p.value))}between$`p-value`=as.numeric(between$`p-value`)between$`Test Statistic`=as.numeric(between$`Test Statistic`) |>round(2)# Applying Bonferroni Correctionbetween = between |>mutate(`Adj. p-value`=ifelse(`p-value`>0.25, 1,round(`p-value`*4,4)))```::: panel-tabset## Within DCM```{r}#| code-summary: ""#| tbl-cap: "Pairwise Welch t-Test summary table for DCM ROIs"kable(within_dcm, digits =4)```## Within ICM```{r}#| code-summary: ""#| tbl-cap: "Pairwise Welch t-Test summary table for ICM ROIs."kable(within_icm, digits =4)```## Between ICM and DCM```{r}#| code-summary: ""#| tbl-cap: "Welch t-Test between ICM and DCM ROIs summary table for each Fibroitc Zone."kable(between, digits=4)```:::Within the DCM Aetiology, the average Fibrosis Score increases betweeneach of the Fibrotic Zones, and the difference between each zone issignificant (as shown by $t$-test results).Within the ICM Aetiology, average Fibrosis Scores increase from Remoteto IF and IF to RF however there is not a significant difference inaverage Fibrosis Scores between RF and RFC Fibrotic Zones. This isimportant to consider in further analysis given that the Fibrotic Zoneis used as an ordinal measure to categorise the Fibrotic Zone of aRegion.When comparing the average Fibrosis Score for a Fibrotic Zone, there isno significant difference between ICM and DCM patients.To investigate the insignificant difference in mean between ICM's RF andRFC Fibrotic Zones, a Two-Way ANOVA was conducted. This was testingwhether there was an interaction effect between the disease Aetiologyand Fibrotic Zone influencing Fibrosis Score. In other words, this wastesting if the relationship between Fibrotic Zone and Fibrosis Score issignificantly different in the two Aetiology groups. The test wasblocked by Patient ID to reduce individual patient discrepancies(sometimes known as a repeated Two-Sample ANOVA).**Statistical Assumptions**: The assumptions for Two-Way ANOVA are:include independence of variables, homoscedasticity and normality.Independence is sufficiently ensured in the use of patient ID as ablocking factor and normality is met by the Central Limit Theorem.Homoscedacity is sufficient for this analysis.```{r}#| code-summary: "Code: Two-way ANOVA for Interaction Effect between Aetiology and Fibrotic Zone"#| tbl-cap: "Two-way ANOVA summary table."# Two way ANOVAanova_df=tidy(aov(fibrotic_score ~ patient_id + aetiology*fibrotic_zone, data=patient_df))colnames(anova_df) =c("term", "Df", "Sum Sq", "Mean Sq", "F value", "Pr(>F)")anova_df = anova_df[c("term", "Sum Sq","Df", "F value", "Pr(>F)")] |>mutate(term =str_replace_all(term, "_", " "),term =str_replace_all(term, ":", " : "),term = tools::toTitleCase(term),term =str_replace_all(term, "Id", "ID"),)colnames(anova_df) =c("", "Sum Sq","Df", "F value", "Pr(>F)")kable(anova_df, digits=4)``````{r warning=FALSE, message=FALSE}#| code-summary: "Code: Interaction Effect Plot"#| fig-cap: "Interaction effect trace plot for two-way ANOVA."# Interaction Effect Plotemmip(aov(fibrotic_score ~ patient_id + aetiology:fibrotic_zone, data=patient_df), aetiology ~ fibrotic_zone) + theme_classic(base_size = 12) + labs(x="Fibrotic Zone", y="Linear Prediciton", title="ANOVA Interaction Effect Plot", colour="Aetiology") + theme(plot.background = element_rect(fill = "#ffffff", linewidth = 0), panel.border = element_rect(colour = "black", fill=NA), legend.box.background = element_rect(colour = "black"), axis.title = element_text(face="bold"), plot.title = element_text(face="bold", size = 14, hjust = 0.5))```With a $p$-value of `r anova_df[["Pr(>F)"]][3] |> round(4)`, this testshowed a significant interaction effect. That is, the relationshipbetween Fibrosis Score and Fibrotic Zone is different between ICM andDCM patients. This is demonstrated by the trace plot, showing that thischange in the relationship between each Aetiology occurs between the RFand RFC Fibrotic Zones. This may point to the misclassification of someROI zones, which can be investigated further.To investigate the relationship between phenotype abundance and FibroticScore, a Linear Mixed Model (LMM) was fitted accounting for the random effects of eachpatient. The associated $p$-values of the model's coefficients were usedto calculate significance.**Statistical Assumptions**: As for Key Goal 1 (@sec-kg1analysis), LMMsassume a linear relationship and independent, normally distributed errorterms with constant variance. Again, Visual inspection of the datathrough exploratory analysis ensured these assumptions have beensatisfied.```{r message=FALSE, warning=FALSE}#| code-summary: "Code: Linear Mixed Model between phenotype abundance and fibrosis score"res_ls = list()sig_res_ls = list()for (aetiolog in unique(proportion_score_df$aetiology)){ aetiology_df = proportion_score_df[proportion_score_df$aetiology == aetiolog, ] for (phenotype in unique(aetiology_df$annotated_metacluster)){ phenotype_df = aetiology_df[aetiology_df$annotated_metacluster == phenotype, ] if (nrow(phenotype_df) > 1){ m = lmer(phenotype_cell_proportion ~ fibrotic_score + (1|patient_id), data = phenotype_df) phenotype_df$fit = predict(m) p = ggplot(phenotype_df) + aes(y=phenotype_cell_proportion, x=fibrotic_score, colour = patient_id) + geom_smooth(aes(y=fit, group=patient_id), method='lm', se = F) + geom_point() + labs(x="Phenotype Proportion", y="Fibrosis Score", title=paste(phenotype)) + theme(plot.background = element_rect(fill = "#ffffff", linewidth = 0), panel.border = element_rect(colour = "black", fill=NA), legend.box.background = element_rect(colour = "black"), axis.title = element_text(face="bold"), plot.title = element_text(face="bold", size = 14, hjust = 0.5)) + labs(colour = "Patient ID") res = lmerTest:::get_coefmat(m) |> as.data.frame() rownames(res) = c("(Intercept)", "Fibrosis Score") eq = paste0("$$\\text{Prop. ", phenotype, "}= \\beta_{\\text{Patient ID}}", ifelse(res['Fibrosis Score', "Estimate"]>=-0.0005, "+", ""), round(res['Fibrosis Score', "Estimate"], 3), "\\times \\text{Fribrosis Score}$$") res_ls[[paste(aetiolog, phenotype)]] = list(table = res,equation = eq) sig_res = res[res$`Pr(>|t|)` < 0.05, ] if (nrow(sig_res)>1){ sig_res_ls[[paste(aetiolog, phenotype)]] = list(table = sig_res, equation = eq, plot = p) } } }}res_ls_icm = res_ls[grep("ICM", names(res_ls))] res_ls_dcm = res_ls[grep("DCM", names(res_ls))] sig_res_ls_icm = sig_res_ls[grep("ICM", names(sig_res_ls))] sig_res_ls_dcm = sig_res_ls[grep("DCM", names(sig_res_ls))] ```### ICM PatientsWithin the ICM aetiology, the effect of Fibrosis Score on phenotypeproportion was significant, and negative for C1, C2b, C3a, Fib1 and C2a.This effect was significant and positive for Neutrophil1.::: panel-tabset```{r message=F, warning=F}#| results: asis# Outputs the info for significant phenotypes# This gets outputted as raw markdown so the tabs can be made# .y ~ phenotype# .x ~ list with the info attached to the phenotypeiwalk(sig_res_ls_icm, ~ { name = strsplit(.y, split = " ")[[1]][-1] cat('## ', name, '\n\n') cat(paste0("<h3>", name, "</h3>")) cat("\n\n") print(.x$plot) cat(paste0('<figcaption class="figure-caption">Figure: Scatter Plot of Phenotype Proportion of ', name, ' against Fibrosis Score for ICM ROIs.</figcaption>')) cat("\n\n") cat(.x$equation) cat("<br>") cat("\n\n") print(kable(.x$table, digits=4)) cat('\n\n')})```:::<details><summary>All results</summary>::: panel-tabset```{r message=F, warning=F}#| results: asis# Outputs the info for significant phenotypes# This gets outputted as raw markdown so the tabs can be made# .y ~ phenotype# .x ~ list with the info attached to the phenotypeiwalk(res_ls_icm, ~ { name = strsplit(.y, split = " ")[[1]][-1] cat('## ', name, '\n\n') cat(paste0("<h3>", name, "</h3>")) cat("\n\n") print(.x$plot) cat(paste0('<figcaption class="figure-caption">Figure: Scatter Plot of Phenotype Proportion of ', name, ' against Fibrosis Score for ICM ROIs.</figcaption>')) cat("\n\n") cat(.x$equation) cat("<br>") cat("\n\n") print(kable(.x$table, digits=4)) cat('\n\n')})```:::</details> ### DCM PatientsWithin the DCM aetiology, the effect of fibrosis score on phenotypeproportion was significant and negative for C1, C2a, C3a, Fib1, C2b, C4and Mast. This effect was significant and positive for M2.::: panel-tabset```{r message=F, warning=F}#| results: asis# Outputs the info for significant phenotypes# This gets outputted as raw markdown so the tabs can be made# .y ~ phenotype# .x ~ list with the info attached to the phenotypeiwalk(sig_res_ls_dcm, ~ { name = strsplit(.y, split = " ")[[1]][-1] cat('## ', name, '\n\n') cat(paste0("<h3>", name, "</h3>")) cat("\n\n") print(.x$plot) cat(paste0('<figcaption class="figure-caption">Figure: Scatter Plot of Phenotype Proportion of ', name, ' against Fibrosis Score for ICM ROIs.</figcaption>')) cat("\n\n") cat(.x$equation) cat("<br>") cat("\n\n") print(kable(.x$table, digits=4)) cat('\n\n')})```:::<details><summary>All results</summary>::: panel-tabset```{r message=F, warning=F}#| results: asis# Outputs the info for significant phenotypes# This gets outputted as raw markdown so the tabs can be made# .y ~ phenotype# .x ~ list with the info attached to the phenotypeiwalk(res_ls_icm, ~ { name = strsplit(.y, split = " ")[[1]][-1] cat('## ', name, '\n\n') cat(paste0("<h3>", name, "</h3>")) cat("\n\n") print(.x$plot) cat(paste0('<figcaption class="figure-caption">Figure: Scatter Plot of Phenotype Proportion of ', name, ' against Fibrosis Score for ICM ROIs.</figcaption>')) cat("\n\n") cat(.x$equation) cat("<br>") cat("\n\n") print(kable(.x$table, digits=4)) cat('\n\n')})```:::</details> # ConclusionIn addressing Key Goal 1 (@sec-kg1), investigation of phenotypeproportion across Fibrotic Zones demonstrated a significant differencein mean phenotype proportion between at least one pair of Fibrotic Zonesfor each of the following phenotypes within each aetiology: - ICM: C1, C2b, C3a, CD45RO, Fib2, Fib3, Fibrocyte2, M2, PopSMA+Fx13a+, ResMac, SMA1, SMA3, C2a, Activated Th and SMA2.- DCM: C1, C2a, CD45RO, Fib3, M2, PopS100+, PopSMA+Fx13a+, ResMac, SMA1, C4, LymphEndo, SMA3 and Mast.This reveals that phenotype proportion varies significantly betweenFibrotic Zones (for at least one pair of zones) for each of the abovephenotypes in their relative disease aetiology.Regarding Key Goal 2 (@sec-kg2), based on significant results from KeyGoal 1 (@sec-kg2analysis) comparison of phenotype proportion withinFibrotic Zones between Aetiologies revealed significant differences inM2, LymphEndo, Fib3 and C3b in the IF Fibrotic Zone and Fib3, LymphEndoand C3b in the RF Fibrotic Zone.For Key Goal 3 (@sec-kg3), linear regression revealed the followingsignificant linear relationships between phenotype abundance andFibrosis Score within each Aetiology:- ICM: - Positive: Neutrophil1. - Negative: C1, C2a, C3a, Fib1 and C2b.- DCM: - Positive: M2. - Negative: C1, C2a, C3a, Fib1, C2b, C4 and Mast.Therefore comparatively, within both ICM and DCM, we see negative linearrelationships between phenotype abundance and Fibrosis Score within C1, C2a, C3a, Fib1 and C2b.For further analysis and investigation, additional code details relatingto all results are in the appendix.# Appendix {#sec-app}<details><summary>Exploratory Data Analysis</summary>The following column graph demonstrates cell proportion within eachdisease Aetiology for each Fibrotic Zone and Cellular Phenotype. Fromthe graph highlights that:- across all zones and both aetiologies, the proportion of the Fib1 phenotype is high.- in both aetiologies as Fibrotic Zone increases, the proportion of: - SMA1 decreases. - B1 increases.The graph also details the number of samples within each category, tohighlight the differences in sample numbers based on individual patientdata. Evidently samples are most abundant in the IF Fibrotic Zone forICM and DCM, as well as the RF Zone for ICM.```{r}#| code-summary: "Code: EDA Plot"counts = cell_proportion_df %>%group_by(aetiology, fibrotic_zone, annotated_metacluster) %>%summarise(avg_cell_proportion =mean(phenotype_cell_proportion), count =n(), .groups ='drop') ggplot(counts) +geom_bar(aes(x = avg_cell_proportion, y = annotated_metacluster, fill = count),stat='identity', colour='black') +scale_fill_gradient2(low=lightorange, high=orange) +coord_flip() +theme(axis.text.x =element_text(angle =90, vjust =0.5, hjust=1, size =6)) +facet_grid(fibrotic_zone ~ aetiology) +ylab('Cellular Phenotype') +xlab('Cell Proportion') +labs(fill ='Number of Samples') +scale_x_continuous(labels = scales::percent) +theme(plot.background =element_rect(fill ="#ffffff",linewidth =0),panel.border =element_rect(colour ="black", fill=NA),legend.box.background =element_rect(colour ="black"),axis.title =element_text(face="bold"), plot.title =element_text(face="bold", size =14, hjust =0.5))```</details>---nocite: | @*---